Re: reader/searcher refresh after replication (commit)

2012-02-22 Thread eks dev
Yes, I consciously let my slaves run away from the master in order to
reduce update latency, but every now and then they sync up with master
that is doing heavy lifting.

The price you pay is that slaves do not see the same documents as the
master, but this is the case anyhow with replication, in my setup
slave may go ahead of master with updates, this delta gets zeroed
after replication and the game starts again.

What you have to take into account with this is very small time window
where you may "go back in time" on slaves (not seeing documents that
were already there), but we are talking about seconds and a couple out
of 200Mio documents (only those documents that were softComited on
slave during replication, since commit ond master and postCommit on
slave).

Why do you think something is strange here?

> What are you expecting a BeforeCommitListener could do for you, if one
> would exist?
Why should I be expecting something?

I just need to read userCommit Data as soon as replication is done,
and I am looking for proper/easy way to do it.  (postCommitListener is
what I use now).

What makes me slightly nervous are those life cycle questions, e.g.
when I issue update command before and after postCommit event, which
index gets updated, the one just replicated or the one that was there
just before replication.

There are definitely ways to optimize this, for example to force
replication handler to copy only delta files if index gets updated on
slave and master  (there is already todo somewhere on solr replication
Wiki I think). Now replicationHandler copies complete index if this
gets detected ...

I am all ears if there are better proposals to have low latency
updates in multi server setup...


On Tue, Feb 21, 2012 at 11:53 PM, Em  wrote:
> Eks,
>
> that sounds strange!
>
> Am I getting you right?
> You have a master which indexes batch-updates from time to time.
> Furthermore you got some slaves, pulling data from that master to keep
> them up-to-date with the newest batch-updates.
> Additionally your slaves index own content in soft-commit mode that
> needs to be available as soon as possible.
> In consequence the slavesare not in sync with the master.
>
> I am not 100% certain, but chances are good that Solr's
> replication-mechanism only changes those segments that are not in sync
> with the master.
>
> What are you expecting a BeforeCommitListener could do for you, if one
> would exist?
>
> Kind regards,
> Em
>
> Am 21.02.2012 21:10, schrieb eks dev:
>> Thanks Mark,
>> Hmm, I would like to have this information asap, not to wait until the
>> first search gets executed (depends on user) . Is solr going to create
>> new searcher as a part of "replication transaction"...
>>
>> Just to make it clear why I need it...
>> I have simple master, many slaves config where master does "batch"
>> updates in big chunks (things user can wait longer to see on search
>> side) but slaves work in soft commit mode internally where I permit
>> them to run away slightly from master in order to know where
>> "incremental update" should start, I read it from UserData 
>>
>> Basically, ideally, before commit (after successful replication is
>> finished) ends, I would like to read in these counters to let
>> "incremental update" run from the right point...
>>
>> I need to prevent updating "replicated index" before I read this
>> information (duplicates can appear) are there any "IndexWriter"
>> listeners around?
>>
>>
>> Thanks again,
>> eks.
>>
>>
>>
>> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller  wrote:
>>> Post commit calls are made before a new searcher is opened.
>>>
>>> Might be easier to try to hook in with a new searcher listener?
>>>
>>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>>
 Hi all,
 I am a bit confused with IndexSearcher refresh lifecycles...
 In a master slave setup, I override postCommit listener on slave
 (solr trunk version) to read some user information stored in
 userCommitData on master

 --
 @Override
 public final void postCommit() {
 // This returnes "stale" information that was present before
 replication finished
 RefCounted refC = core.getNewestSearcher(true);
 Map userData =
 refC.get().getIndexReader().getIndexCommit().getUserData();
 }
 
 I expected core.getNewestSearcher(true); to return refreshed
 SolrIndexSearcher, but it didn't

 When is this information going to be refreshed to the status from the
 replicated index, I repeat this is postCommit listener?

 What is the way to get the information from the last commit point?

 Maybe like this?
 core.getDeletionPolicy().getLatestCommit().getUserData();

 Or I need to explicitly open new searcher (isn't solr does this behind
 the scenes?)
 core.openNewSearcher(false, false)

 Not critical, reopening new searcher works, but I would like to
 understand these lifecycles, when sol

Re: Unique key constraint and optimistic locking (versioning)

2012-02-22 Thread Per Steffensen
Thanks a lot. We will use the UniqueKey feature and build versioning 
ourselves. Do you think it would be a good idea if we built a versioning 
feature into Solr/Lucene instead of doing it outside, so that others can 
benefit from the feature as well? Guess contributions will be made 
according to http://wiki.apache.org/solr/HowToContribute. It is possible 
for "outsiders" (like us) to get a SVN branch at svn.apache.org to 
prepare contributions, or do we have to use our own SVN? Are there any 
plans migrating lucene/solr codebase to Git, which will make it easier 
getting a "separate area" to work on the code (making a Git fork), and 
suggest the contribution back to core lucene/solr (doing a Git "pull 
request")?


Thanks!
Per Steffensen

Em skrev:

Hi Per,

Solr provides the so called "UniqueKey"-field.
Refer to the Wiki to learn more:
http://wiki.apache.org/solr/UniqueKey

  

Optimistic locking (versioning)


... is not provided by Solr out of the box. If you add a new document
with the same UniqueKey it replaces the old one.
You have to do the versioning on your own (and keep in mind concurrent
updates).

Kind regards,
Em

Am 21.02.2012 13:50, schrieb Per Steffensen:
  

Hi

Does solr/lucene provide any mechanism for "unique key constraint" and
"optimistic locking (versioning)"?
Unique key constraint: That a client will not succeed creating a new
document in solr/lucene if a document already exists having the same
value in some field (e.g. an id field). Of course implemented right, so
that even though two or more threads are concurrently trying to create a
new document with the same value in this field, only one of them will
succeed.
Optimistic locking (versioning): That a client will only succeed
updating a document if this updated document is based on the version of
the document currently stored in solr/lucene. Implemented in the
optimistic way that clients during an update have to tell which version
of the document they fetched from Solr and that they therefore have used
as a starting-point for their updated document. So basically having a
version field on the document that clients increase by one before
sending to solr for update, and some code in Solr that only makes the
update succeed if the version number of the updated document is exactly
one higher than the version number of the document already stored. Of
course again implemented right, so that even though two or more thrads
are concurrently trying to update a document, and they all have their
updated document based on the current version in solr/lucene, only one
of them will succeed.

Or do I have to do stuff like this myself outside solr/lucene - e.g. in
the client using solr.

Regards, Per Steffensen




  




Re: Unique key constraint and optimistic locking (versioning)

2012-02-22 Thread Per Steffensen

Per Steffensen skrev:
Thanks a lot. We will use the UniqueKey feature and build versioning 
ourselves. Do you think it would be a good idea if we built a 
versioning feature into Solr/Lucene instead of doing it outside, so 
that others can benefit from the feature as well? Guess contributions 
will be made according to http://wiki.apache.org/solr/HowToContribute. 
It is possible for "outsiders" (like us) to get a SVN branch at 
svn.apache.org to prepare contributions, or do we have to use our own 
SVN? Are there any plans migrating lucene/solr codebase to Git, which 
will make it easier getting a "separate area" to work on the code 
(making a Git fork), and suggest the contribution back to core 
lucene/solr (doing a Git "pull request")?
Sorry - didnt see the "Eclipse (using Git)" chapter on 
http://wiki.apache.org/solr/HowToContribute. We might contribute in that 
area.


Thanks!
Per Steffensen




Re: reader/searcher refresh after replication (commit)

2012-02-22 Thread Em
Sounds much clearer to me than before. :)

Ad-hoc I have two ideas:
First: Let Replication run asynchronously.
If shard1 is pulling the new index from the master and therefore very
recent documents aren't available anymore, shard2 will find them in the
mean-time. As soon as shard1 is up-to-date (including the most recent
documents) shard2 can pull its update from the master.
However beeing out of sync between two shards that should serve the same
data has its own problems, I think.

Second:
You can have another SolrCore for the most recent documents. This one
could be based on a RAMDirectory for reduced latency (or even use
NRT-features, if available in your Solr-version).
Your Master-Slave setup becomes more easier, since you do not have to
worry about out-of-sync-scenarios anymore.
The challange here is to handle duplicate documents (i.e. newer versions
in the RAMDirectory) and proper relevancy due to unbalanced shards by
design.

Kind regards,
Em


Am 22.02.2012 09:25, schrieb eks dev:
> Yes, I consciously let my slaves run away from the master in order to
> reduce update latency, but every now and then they sync up with master
> that is doing heavy lifting.
> 
> The price you pay is that slaves do not see the same documents as the
> master, but this is the case anyhow with replication, in my setup
> slave may go ahead of master with updates, this delta gets zeroed
> after replication and the game starts again.
> 
> What you have to take into account with this is very small time window
> where you may "go back in time" on slaves (not seeing documents that
> were already there), but we are talking about seconds and a couple out
> of 200Mio documents (only those documents that were softComited on
> slave during replication, since commit ond master and postCommit on
> slave).
> 
> Why do you think something is strange here?
> 
>> What are you expecting a BeforeCommitListener could do for you, if one
>> would exist?
> Why should I be expecting something?
> 
> I just need to read userCommit Data as soon as replication is done,
> and I am looking for proper/easy way to do it.  (postCommitListener is
> what I use now).
> 
> What makes me slightly nervous are those life cycle questions, e.g.
> when I issue update command before and after postCommit event, which
> index gets updated, the one just replicated or the one that was there
> just before replication.
> 
> There are definitely ways to optimize this, for example to force
> replication handler to copy only delta files if index gets updated on
> slave and master  (there is already todo somewhere on solr replication
> Wiki I think). Now replicationHandler copies complete index if this
> gets detected ...
> 
> I am all ears if there are better proposals to have low latency
> updates in multi server setup...
> 
> 
> On Tue, Feb 21, 2012 at 11:53 PM, Em  wrote:
>> Eks,
>>
>> that sounds strange!
>>
>> Am I getting you right?
>> You have a master which indexes batch-updates from time to time.
>> Furthermore you got some slaves, pulling data from that master to keep
>> them up-to-date with the newest batch-updates.
>> Additionally your slaves index own content in soft-commit mode that
>> needs to be available as soon as possible.
>> In consequence the slavesare not in sync with the master.
>>
>> I am not 100% certain, but chances are good that Solr's
>> replication-mechanism only changes those segments that are not in sync
>> with the master.
>>
>> What are you expecting a BeforeCommitListener could do for you, if one
>> would exist?
>>
>> Kind regards,
>> Em
>>
>> Am 21.02.2012 21:10, schrieb eks dev:
>>> Thanks Mark,
>>> Hmm, I would like to have this information asap, not to wait until the
>>> first search gets executed (depends on user) . Is solr going to create
>>> new searcher as a part of "replication transaction"...
>>>
>>> Just to make it clear why I need it...
>>> I have simple master, many slaves config where master does "batch"
>>> updates in big chunks (things user can wait longer to see on search
>>> side) but slaves work in soft commit mode internally where I permit
>>> them to run away slightly from master in order to know where
>>> "incremental update" should start, I read it from UserData 
>>>
>>> Basically, ideally, before commit (after successful replication is
>>> finished) ends, I would like to read in these counters to let
>>> "incremental update" run from the right point...
>>>
>>> I need to prevent updating "replicated index" before I read this
>>> information (duplicates can appear) are there any "IndexWriter"
>>> listeners around?
>>>
>>>
>>> Thanks again,
>>> eks.
>>>
>>>
>>>
>>> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller  wrote:
 Post commit calls are made before a new searcher is opened.

 Might be easier to try to hook in with a new searcher listener?

 On Feb 21, 2012, at 8:23 AM, eks dev wrote:

> Hi all,
> I am a bit confused with IndexSearcher refresh lifecycles...
> In a master 

SIREn integration with SOLR

2012-02-22 Thread chitra
Hi,

   We would like  to implement semantic search in our websites. We
already have the full text search service by using SOLR. Heard that SIREn
plug-in for SOLR would be able to allow to index & query the semi-structred
data.

Could any one of you provide me more details about SIREn, its integration
with SOLR and how to use it with PHP

Thanks in advance...

Regards
Chitra 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SIREn-integration-with-SOLR-tp3766056p3766056.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fields, Facets, and Search Results

2012-02-22 Thread drLocke97
I'm new to SOLR and trying to get a proper understanding of what's going on
with fields, facets, and search results.

I've modified the example schema.xml and solrconfig.xml that comes with SOLR
to reflect some fields I want to experiment with. I've also modified the
velocity templates in Solaritas accordingly. I've created some sample s
to post to the index that have the fields/data I want to experiment with.
Everything "compiles" and works, but my search results are not what I expect
and I'm trying to understand why.

Every field I have is defined the same (here are two examples):




The puzzle is that I'm getting search results on every term that's in the
"title" field, but nothing on terms in the "section_text_content" field. I
have no idea why. I thought at first it was because I'd specified the title
field to also be a facet, but I removed that and things remain as described
(except now, of course, the facet for "title" is gone).

Can anyone provide some insight?

Don

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fields-Facets-and-Search-Results-tp3765946p3765946.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to mock solr server solr_sruby

2012-02-22 Thread solr
Hi,
Am using solr_ruby in ruby code for that am starting solr server by using
start.jsr.
Now i want to write mockobjects for solr connection and code written in my
ruby file to search data from solr.
Can anybody suggest how to do testing without stating solr server

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-mock-solr-server-solr-sruby-tp3766080p3766080.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fields, Facets, and Search Results

2012-02-22 Thread darul
Check you schema config file first.

It looks like you have missed copy of "section_text_content" field's content
to your default search field :

 
 text



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fields-Facets-and-Search-Results-tp3765946p3766084.html
Sent from the Solr - User mailing list archive at Nabble.com.


'location' fieldType indexation impossible

2012-02-22 Thread Xavier
Hi,

When i try to index my location field i get this error for each documents :
*ATTENTION: Error creating document  Error adding field
'emploi_city_geoloc'='48.85,2.5525' *
(so i have 0 files indexed)

Here is my schema.xml :
**

I really don't understand why it isnt working because, it was working on my
local server with the same configuration (Solr 3.5.0) and the same database
!!!

If i try to use "geohash" instead of "location" it is working for
indexation, but my geodist query in front isnt working anymore ...

Any ideas ?

Best regards,
Xavier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/location-fieldType-indexation-impossible-tp3766136p3766136.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Date filter query

2012-02-22 Thread ku3ia
Hi, all
Thanks for your responses.

I'd tried
[NOW/DAY-30DAY+TO+NOW/DAY-1DAY-1SECOND]
and seems it works fine for me.

Thanks a lot!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-filter-query-tp3764349p3766139.html
Sent from the Solr - User mailing list archive at Nabble.com.


How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Dear all,

I wonder how data in HBase is indexed? Now Solr is used in my system
because data is managed in inverted index. Such an index is suitable to
retrieve unstructured and huge amount of data. How does HBase deal with the
issue? May I replaced Solr with HBase?

Thanks so much!

Best regards,
Bing


Re: Fast Vector Highlighter Working for some records only

2012-02-22 Thread dhaivat

Koji Sekiguchi wrote
> 
> (12/02/22 11:58), dhaivat wrote:
>> Thanks for reply,
>>
>> But can you please tell me why it's working for some documents and not
>> for
>> other.
> 
> As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr just
> ignore it, but due to hl=true is there, Solr tries to create highlight
> snippets
> by using (existing; traditional; I mean not FVH) Highlighter.
> Highlighter (including FVH) cannot produce snippets sometime for some
> reasons,
> you can use hl.alternateField parameter.
> 
> http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField
> 
> koji
> -- 
> Query Log Visualizer for Apache Solr
> http://soleami.com/
> 

Thank you so much explanation,
 
I have updated my solr version and using 3.5, Could you please tell me when
i am using custom Tokenizer on the field,so do i need to make any changes
related Solr highlighter. 

here is my custom analyser

 
  

  
  

 
  


here is the field info:



i am creating tokens using my custom analyser and when i am trying to use
highlighter it's not working properly for contents field.. but when i tried
to use Solr inbuilt tokeniser i am finding the word highlighted for
particular query.. Please can you help me out with this ?


Thanks in advance
Dhaivat





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3766335.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to mock solr server solr_sruby

2012-02-22 Thread Erik Hatcher
You can mock with Ruby really easily, but just overriding methods that would 
otherwise call the server and fake a response.  The solr-ruby library itself 
was built with an extensive test suite doing this.  Here's the mock base:

   


You can see its use here:

  


Erik

On Feb 22, 2012, at 05:30 , solr wrote:

> Hi,
> Am using solr_ruby in ruby code for that am starting solr server by using
> start.jsr.
> Now i want to write mockobjects for solr connection and code written in my
> ruby file to search data from solr.
> Can anybody suggest how to do testing without stating solr server
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-mock-solr-server-solr-sruby-tp3766080p3766080.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: reader/searcher refresh after replication (commit)

2012-02-22 Thread Erick Erickson
You'll *really like* the SolrCloud stuff going into trunk when it's baked
for a while

Best
Erick

On Wed, Feb 22, 2012 at 3:25 AM, eks dev  wrote:
> Yes, I consciously let my slaves run away from the master in order to
> reduce update latency, but every now and then they sync up with master
> that is doing heavy lifting.
>
> The price you pay is that slaves do not see the same documents as the
> master, but this is the case anyhow with replication, in my setup
> slave may go ahead of master with updates, this delta gets zeroed
> after replication and the game starts again.
>
> What you have to take into account with this is very small time window
> where you may "go back in time" on slaves (not seeing documents that
> were already there), but we are talking about seconds and a couple out
> of 200Mio documents (only those documents that were softComited on
> slave during replication, since commit ond master and postCommit on
> slave).
>
> Why do you think something is strange here?
>
>> What are you expecting a BeforeCommitListener could do for you, if one
>> would exist?
> Why should I be expecting something?
>
> I just need to read userCommit Data as soon as replication is done,
> and I am looking for proper/easy way to do it.  (postCommitListener is
> what I use now).
>
> What makes me slightly nervous are those life cycle questions, e.g.
> when I issue update command before and after postCommit event, which
> index gets updated, the one just replicated or the one that was there
> just before replication.
>
> There are definitely ways to optimize this, for example to force
> replication handler to copy only delta files if index gets updated on
> slave and master  (there is already todo somewhere on solr replication
> Wiki I think). Now replicationHandler copies complete index if this
> gets detected ...
>
> I am all ears if there are better proposals to have low latency
> updates in multi server setup...
>
>
> On Tue, Feb 21, 2012 at 11:53 PM, Em  wrote:
>> Eks,
>>
>> that sounds strange!
>>
>> Am I getting you right?
>> You have a master which indexes batch-updates from time to time.
>> Furthermore you got some slaves, pulling data from that master to keep
>> them up-to-date with the newest batch-updates.
>> Additionally your slaves index own content in soft-commit mode that
>> needs to be available as soon as possible.
>> In consequence the slavesare not in sync with the master.
>>
>> I am not 100% certain, but chances are good that Solr's
>> replication-mechanism only changes those segments that are not in sync
>> with the master.
>>
>> What are you expecting a BeforeCommitListener could do for you, if one
>> would exist?
>>
>> Kind regards,
>> Em
>>
>> Am 21.02.2012 21:10, schrieb eks dev:
>>> Thanks Mark,
>>> Hmm, I would like to have this information asap, not to wait until the
>>> first search gets executed (depends on user) . Is solr going to create
>>> new searcher as a part of "replication transaction"...
>>>
>>> Just to make it clear why I need it...
>>> I have simple master, many slaves config where master does "batch"
>>> updates in big chunks (things user can wait longer to see on search
>>> side) but slaves work in soft commit mode internally where I permit
>>> them to run away slightly from master in order to know where
>>> "incremental update" should start, I read it from UserData 
>>>
>>> Basically, ideally, before commit (after successful replication is
>>> finished) ends, I would like to read in these counters to let
>>> "incremental update" run from the right point...
>>>
>>> I need to prevent updating "replicated index" before I read this
>>> information (duplicates can appear) are there any "IndexWriter"
>>> listeners around?
>>>
>>>
>>> Thanks again,
>>> eks.
>>>
>>>
>>>
>>> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller  wrote:
 Post commit calls are made before a new searcher is opened.

 Might be easier to try to hook in with a new searcher listener?

 On Feb 21, 2012, at 8:23 AM, eks dev wrote:

> Hi all,
> I am a bit confused with IndexSearcher refresh lifecycles...
> In a master slave setup, I override postCommit listener on slave
> (solr trunk version) to read some user information stored in
> userCommitData on master
>
> --
> @Override
> public final void postCommit() {
> // This returnes "stale" information that was present before
> replication finished
> RefCounted refC = core.getNewestSearcher(true);
> Map userData =
> refC.get().getIndexReader().getIndexCommit().getUserData();
> }
> 
> I expected core.getNewestSearcher(true); to return refreshed
> SolrIndexSearcher, but it didn't
>
> When is this information going to be refreshed to the status from the
> replicated index, I repeat this is postCommit listener?
>
> What is the way to get the information from the last commit point?
>
> Maybe like this?
> core.ge

Re: reader/searcher refresh after replication (commit)

2012-02-22 Thread Em
Erick,

> You'll *really like* the SolrCloud stuff going into trunk when it's baked
> for a while
How stable is SolrCloud at the moment?
I can not wait to try it out.

Kind regards,
Em


Am 22.02.2012 14:45, schrieb Erick Erickson:
> You'll *really like* the SolrCloud stuff going into trunk when it's baked
> for a while
> 
> Best
> Erick
> 
> On Wed, Feb 22, 2012 at 3:25 AM, eks dev  wrote:
>> Yes, I consciously let my slaves run away from the master in order to
>> reduce update latency, but every now and then they sync up with master
>> that is doing heavy lifting.
>>
>> The price you pay is that slaves do not see the same documents as the
>> master, but this is the case anyhow with replication, in my setup
>> slave may go ahead of master with updates, this delta gets zeroed
>> after replication and the game starts again.
>>
>> What you have to take into account with this is very small time window
>> where you may "go back in time" on slaves (not seeing documents that
>> were already there), but we are talking about seconds and a couple out
>> of 200Mio documents (only those documents that were softComited on
>> slave during replication, since commit ond master and postCommit on
>> slave).
>>
>> Why do you think something is strange here?
>>
>>> What are you expecting a BeforeCommitListener could do for you, if one
>>> would exist?
>> Why should I be expecting something?
>>
>> I just need to read userCommit Data as soon as replication is done,
>> and I am looking for proper/easy way to do it.  (postCommitListener is
>> what I use now).
>>
>> What makes me slightly nervous are those life cycle questions, e.g.
>> when I issue update command before and after postCommit event, which
>> index gets updated, the one just replicated or the one that was there
>> just before replication.
>>
>> There are definitely ways to optimize this, for example to force
>> replication handler to copy only delta files if index gets updated on
>> slave and master  (there is already todo somewhere on solr replication
>> Wiki I think). Now replicationHandler copies complete index if this
>> gets detected ...
>>
>> I am all ears if there are better proposals to have low latency
>> updates in multi server setup...
>>
>>
>> On Tue, Feb 21, 2012 at 11:53 PM, Em  wrote:
>>> Eks,
>>>
>>> that sounds strange!
>>>
>>> Am I getting you right?
>>> You have a master which indexes batch-updates from time to time.
>>> Furthermore you got some slaves, pulling data from that master to keep
>>> them up-to-date with the newest batch-updates.
>>> Additionally your slaves index own content in soft-commit mode that
>>> needs to be available as soon as possible.
>>> In consequence the slavesare not in sync with the master.
>>>
>>> I am not 100% certain, but chances are good that Solr's
>>> replication-mechanism only changes those segments that are not in sync
>>> with the master.
>>>
>>> What are you expecting a BeforeCommitListener could do for you, if one
>>> would exist?
>>>
>>> Kind regards,
>>> Em
>>>
>>> Am 21.02.2012 21:10, schrieb eks dev:
 Thanks Mark,
 Hmm, I would like to have this information asap, not to wait until the
 first search gets executed (depends on user) . Is solr going to create
 new searcher as a part of "replication transaction"...

 Just to make it clear why I need it...
 I have simple master, many slaves config where master does "batch"
 updates in big chunks (things user can wait longer to see on search
 side) but slaves work in soft commit mode internally where I permit
 them to run away slightly from master in order to know where
 "incremental update" should start, I read it from UserData 

 Basically, ideally, before commit (after successful replication is
 finished) ends, I would like to read in these counters to let
 "incremental update" run from the right point...

 I need to prevent updating "replicated index" before I read this
 information (duplicates can appear) are there any "IndexWriter"
 listeners around?


 Thanks again,
 eks.



 On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller  wrote:
> Post commit calls are made before a new searcher is opened.
>
> Might be easier to try to hook in with a new searcher listener?
>
> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>
>> Hi all,
>> I am a bit confused with IndexSearcher refresh lifecycles...
>> In a master slave setup, I override postCommit listener on slave
>> (solr trunk version) to read some user information stored in
>> userCommitData on master
>>
>> --
>> @Override
>> public final void postCommit() {
>> // This returnes "stale" information that was present before
>> replication finished
>> RefCounted refC = core.getNewestSearcher(true);
>> Map userData =
>> refC.get().getIndexReader().getIndexCommit().getUserData();
>> }
>> 
>> 

Re: How to handle to run testcases in ruby code for solr

2012-02-22 Thread solr
Hi Erik,
I have tried links which you given. while runnign rake
am getting error

==
Errno::ECONNREFUSED: No connection could be made because the target machine
acti
vely refused it. - connect(2)
===

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-handle-to-run-testcases-in-ruby-code-for-solr-tp3753479p3766559.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr on netty

2012-02-22 Thread prasenjit mukherjee
Is anybody aware of any effort regarding porting solr to a netty ( or
any other async-io based framework ) based framework.

Even on medium load ( 10 parallel clients )  with 16 shards
performance seems to deteriorate quite sharply compared another
alternative ( async-io based ) solution as load increases.

-Prasenjit

-- 
Sent from my mobile device


Re: How to handle to run testcases in ruby code for solr

2012-02-22 Thread Erik Hatcher
I'm not sure what to suggest at this point... obviously your test setup is 
trying to hit a Solr server that isn't running.  Check the host and port that 
it is trying and ensure that Solr is running as your tests expect or use the 
mock way that I just replied about.

Note, again, that solr-ruby is deprecated and unsupported at this point.  I 
recommend you give the RSolr project a try if you want support with it in the 
future.

Erik

On Feb 22, 2012, at 09:10 , solr wrote:

> Hi Erik,
> I have tried links which you given. while runnign rake
> am getting error
> 
> ==
> Errno::ECONNREFUSED: No connection could be made because the target machine
> acti
> vely refused it. - connect(2)
> ===
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-handle-to-run-testcases-in-ruby-code-for-solr-tp3753479p3766559.html
> Sent from the Solr - User mailing list archive at Nabble.com.



solr 3.5 and indexing performance

2012-02-22 Thread mizayah
Hello,

I wanted to switch to new version of solr, exactelly to 3.5 but im getting
big drop of indexing speed.

I'm using 3.1 and after few tests i discower that 3.4 do it a lot of better
then 3.5

My schema is really simple few field using "text" type field

/
  









  
  








  

/



All data and configuration are the same, same schema, solrconfig, same
jetty.

*SOLR 3.5*
/Feb 22, 2012 3:40:33 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
   
commit{dir=/vol/home/mciurla/proj/solr/accordion3.5/example/solr/data/index,segFN=segments_bl,version=1329831219365,generation=417,filenames=[_a5.fdx,
_52.fdx, _aq.frq,
_a5.fdt, _cr.nrm, _52.fnm, _a5.prx, segments_bl, _52.fdt, _7k.tii, _cr.frq,
_a5.tis, _cr.fdt, _a5.nrm, _cr.prx, _cp.prx, _cr.fdx, _cn.nrm, _52.tvf,
_cp.fnm, _co.tii, _52.tvd, _8
o.tvx, _co.tis, _8o.tii, _a5.fnm, _8o.tvd, _7k.tis, _8o.tvf, _bb.tis,
_7k.fdx, _7k.fdt, _7k.frq, _bb.tii, _cn.frq, _co.prx, _aq.tii, _cq.fdx,
_52.tii, _cm.tis, _cq.fdt, _aq.tis,
 _52.tis, _aq.tvx, _co.nrm, _bb.prx, _cm.tii, _cr.fnm, _aq.tvf, _bb_3.del,
_aq.tvd, _cm.frq, _cp.nrm, _cq.tis, _52.prx, _cn.tis, _8o.fnm, _cl.nrm,
_cl.fnm, _a5.tii, _cn.tii, _cq
.tii, _cp.tis, _cp.fdt, _cl.fdt, _cl.prx, _aq.fdt, _cl.fdx, _cr.tis,
_co.frq, _7k.fnm, _cq.frq, _bb.fnm, _cr.tii, _cp.fdx, _cp.tii, _aq.fdx,
_cq.tvd, _8o.fdt, _cq.tvf, _52.nrm,
_8o.nrm, _aq.fnm, _8o.prx, _co.tvd, _cq.tvx, _52.frq, _bb.nrm, _bb.fdt,
_cp.tvf, _a5.tvx, _cp.tvd, _cn.tvx, _7k.nrm, _bb.fdx, _cm.tvx, _cm.fdx,
_cl.tvf, _cp.tvx, _co.fdx, _cl.tv
d, _cn.tvf, _a5.frq, _cm.fdt, _a5.tvf, _co.fdt, _a5.tvd, _cp.frq, _cn.fdt,
_cm.nrm, _7k_d.del, _cn.fdx, _52_1e.del, _7k.prx, _8o.fdx, _cn.prx, _cl.tis,
_cq.nrm, _7k.tvx, _cq.prx
, _cn.tvd, _cl.tii, _cm.fnm, _7k.tvd, _cm.prx, _8o.tis, _cm.tvf, _52.tvx,
_7k.tvf, _cl.tvx, _cm.tvd, _a5_9.del, _bb.tvf, _bb.tvd, _cr.tvd, _co.tvf,
_bb.tvx, _cr.tvf, _co.fnm, _a
q.prx, _cl.frq, _cq.fnm, _aq_9.del, _bb.frq, _8o.frq, _aq.nrm, _co.tvx,
_8o_t.del, _cr.tvx, _cn.fnm, _cl_6.del]
Feb 22, 2012 3:40:33 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1329831219365
Feb 22, 2012 3:40:47 PM org.apache.solr.update.processor.LogUpdateProcessor
finish
*INFO: {add=[2271874, 2271875, 2271876, 2271877, 2271878, 2271879, 2271880,
2271881, ... (100 adds)]} 0 14213*
Feb 22, 2012 3:40:47 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={} status=0 QTime=14213
/
when on solr 3.4
/Feb 22, 2012 3:42:56 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
commit{dir=/vol/home/mciurla/proj/solr/accordion3.4/example/solr/data/index,segFN=segments_29,version=1329918470592,generation=81,filenames=[_2b.tvf,
_2c.tvx, _2d.tvf, _2f.tvx, _2d.tvd, _15.prx, _15.frq, _2b.tvd, _2c.nrm,
_20.fnm, _2b.tvx, _2c.fdx, _2c.prx, _2f.tii, _2f.tvf, _20.tvx, _2b.fnm,
_2c.fdt, _2d.tis, _15.fdt, _20.frq, _2d.tvx, _2f.tvd, _15.fdx, _15.fnm,
_2c.tvf, _2e.frq, _2e.prx, _2c.tvd, _2b.frq, _20.tvd, _2c.fnm, _20.tvf,
_2e.tvf, _2e.nrm, _20.tis, _2b.prx, _20.tii, _2e.tvd, _15.tis, _2f.frq,
_15.tii, _2e.tvx, _2e.tii, _2c.tis, _2c.frq, _2e.fdx, _2f.prx, _2f.fnm,
_15.tvx, _2e.fdt, _15.tvf, _2b.tis, _2c.tii, _2d.prx, _2d.fnm, _20.fdx,
_2b.tii, _2e.tis, _20.fdt, _2d.frq, _2b.nrm, _15.tvd, _15_b.del, _2b.fdt,
_2f.nrm, _2d.fdx, segments_29, _2d.fdt, _2b.fdx, _20_2.del, _15.nrm,
_2f.tis, _2d.tii, _2d.nrm, _20.prx, _20.nrm, _2e.fnm, _2f.fdt, _2f.fdx] Feb
22, 2012 3:42:56 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 1329918470592 
Feb 22, 2012 3:42:56 PM org.apache.solr.update.processor.LogUpdateProcessor
finish 
*INFO: {add=[2269393, 2269394, 2269395, 2269396, 2269397, 2269398, 2269399,
2269400, ... (100 adds)]} 0 145 
*Feb 22, 2012 3:42:56 PM org.apache.solr.core.SolrCore execute INFO: []
webapp=/solr path=/update params={} status=0 QTime=145/


*Any idea what is going on?*

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-5-and-indexing-performance-tp3766653p3766653.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: reader/searcher refresh after replication (commit)

2012-02-22 Thread Erick Erickson
It's certainly stable enough to start experimenting with, and I know
that it's under pretty active development now. I've seen a lot
of back-and-forth between Mark Miller and Jamie Johnson,
Jamie trying things and Mark responding.

It's part of the trunk, so be prepared for occasional re-indexing
being required. This isn't related to SolrCloud, just the
fact that it's only available on trunk.

And I'm certain that the more eyes look at it, the better it'll be,
so I'd say "go for it". I tried out the example here:
http://wiki.apache.org/solr/SolrCloud
and it went quite well, but I didn't stress it much yet (that's next).

Personally, I'd put it through some pretty heavy testing before
deploying to production at this point, just because of all the
new features on trunk. But having people work with it is the best
way to move the effort forward.

So feel free!
Erick

On Wed, Feb 22, 2012 at 9:07 AM, Em  wrote:
> Erick,
>
>> You'll *really like* the SolrCloud stuff going into trunk when it's baked
>> for a while
> How stable is SolrCloud at the moment?
> I can not wait to try it out.
>
> Kind regards,
> Em
>
>
> Am 22.02.2012 14:45, schrieb Erick Erickson:
>> You'll *really like* the SolrCloud stuff going into trunk when it's baked
>> for a while
>>
>> Best
>> Erick
>>
>> On Wed, Feb 22, 2012 at 3:25 AM, eks dev  wrote:
>>> Yes, I consciously let my slaves run away from the master in order to
>>> reduce update latency, but every now and then they sync up with master
>>> that is doing heavy lifting.
>>>
>>> The price you pay is that slaves do not see the same documents as the
>>> master, but this is the case anyhow with replication, in my setup
>>> slave may go ahead of master with updates, this delta gets zeroed
>>> after replication and the game starts again.
>>>
>>> What you have to take into account with this is very small time window
>>> where you may "go back in time" on slaves (not seeing documents that
>>> were already there), but we are talking about seconds and a couple out
>>> of 200Mio documents (only those documents that were softComited on
>>> slave during replication, since commit ond master and postCommit on
>>> slave).
>>>
>>> Why do you think something is strange here?
>>>
 What are you expecting a BeforeCommitListener could do for you, if one
 would exist?
>>> Why should I be expecting something?
>>>
>>> I just need to read userCommit Data as soon as replication is done,
>>> and I am looking for proper/easy way to do it.  (postCommitListener is
>>> what I use now).
>>>
>>> What makes me slightly nervous are those life cycle questions, e.g.
>>> when I issue update command before and after postCommit event, which
>>> index gets updated, the one just replicated or the one that was there
>>> just before replication.
>>>
>>> There are definitely ways to optimize this, for example to force
>>> replication handler to copy only delta files if index gets updated on
>>> slave and master  (there is already todo somewhere on solr replication
>>> Wiki I think). Now replicationHandler copies complete index if this
>>> gets detected ...
>>>
>>> I am all ears if there are better proposals to have low latency
>>> updates in multi server setup...
>>>
>>>
>>> On Tue, Feb 21, 2012 at 11:53 PM, Em  wrote:
 Eks,

 that sounds strange!

 Am I getting you right?
 You have a master which indexes batch-updates from time to time.
 Furthermore you got some slaves, pulling data from that master to keep
 them up-to-date with the newest batch-updates.
 Additionally your slaves index own content in soft-commit mode that
 needs to be available as soon as possible.
 In consequence the slavesare not in sync with the master.

 I am not 100% certain, but chances are good that Solr's
 replication-mechanism only changes those segments that are not in sync
 with the master.

 What are you expecting a BeforeCommitListener could do for you, if one
 would exist?

 Kind regards,
 Em

 Am 21.02.2012 21:10, schrieb eks dev:
> Thanks Mark,
> Hmm, I would like to have this information asap, not to wait until the
> first search gets executed (depends on user) . Is solr going to create
> new searcher as a part of "replication transaction"...
>
> Just to make it clear why I need it...
> I have simple master, many slaves config where master does "batch"
> updates in big chunks (things user can wait longer to see on search
> side) but slaves work in soft commit mode internally where I permit
> them to run away slightly from master in order to know where
> "incremental update" should start, I read it from UserData 
>
> Basically, ideally, before commit (after successful replication is
> finished) ends, I would like to read in these counters to let
> "incremental update" run from the right point...
>
> I need to prevent updating "replicated index" before I read this
>

Re: Solr on netty

2012-02-22 Thread Yonik Seeley
On Wed, Feb 22, 2012 at 9:27 AM, prasenjit mukherjee
 wrote:
> Is anybody aware of any effort regarding porting solr to a netty ( or
> any other async-io based framework ) based framework.
>
> Even on medium load ( 10 parallel clients )  with 16 shards
> performance seems to deteriorate quite sharply compared another
> alternative ( async-io based ) solution as load increases.

By "16 shards" do you mean you have 16 nodes and each single client
request causes a distributed search across all of them them?  How many
concurrent requests are your 10 clients making to each node?

NIO works well when there are many clients, but when servicing those
client requests only needs intermittent CPU.  That's not the pattern
we see for search.
You *can* easily configure Solr's Jetty to use NIO when accepting
client connections, but it won't do you any good, just as switching to
Netty wouldn't do anything here.

Where NIO could help a little is with the requests that Solr makes to
other Solr instances.  Solr is already architected for async
request-response to other nodes, but the current underlying
implementation uses HttpClient 3 (which doesn't have NIO).

Anyway, it's unlikely that NIO vs BIO will make much of a difference
with the numbers you're talking about (16 shards).

Someone else reported that we have the number of connections per host
set too low, and they saw big gains by increasing this.  There's an
issue open to make this configurable in 3x:
https://issues.apache.org/jira/browse/SOLR-3079
We should probably up the max connections per host by default.

-Yonik
lucidimagination.com


Unusually long data import time?

2012-02-22 Thread Devon Baumgarten
Hello,

Would it be unusual for an import of 160 million documents to take 18 hours?  
Each document is less than 1kb and I have the DataImportHandler using the jdbc 
driver to connect to SQL Server 2008. The full-import query calls a stored 
procedure that contains only a select from my target table.

Is there any way I can speed this up? I saw recently someone on this list 
suggested a new user could get all their Solr data imported in under an hour. I 
sure hope that's true!


Devon Baumgarten




Re: Unusually long data import time?

2012-02-22 Thread Glen Newton
Import times will depend on:
- hardware (speed of disks, cpu, # of cpus, amount of memory, etc)
- Java configuration (heap size, etc)
- Lucene/Solr configuration (many ...)
- Index configuration - how many fields, indexed how; faceting, etc
- OS configuration (this usually to a lesser degree; _usually_)
- Network issues if non-local
- DB configuration (driver, etc)

If you can give more information about the above, people on this list
should be able to better indicate whether 18 hours sounds right for
your situation.

-Glen Newton

On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten
 wrote:
> Hello,
>
> Would it be unusual for an import of 160 million documents to take 18 hours?  
> Each document is less than 1kb and I have the DataImportHandler using the 
> jdbc driver to connect to SQL Server 2008. The full-import query calls a 
> stored procedure that contains only a select from my target table.
>
> Is there any way I can speed this up? I saw recently someone on this list 
> suggested a new user could get all their Solr data imported in under an hour. 
> I sure hope that's true!
>
>
> Devon Baumgarten
>
>



-- 
-
http://zzzoot.blogspot.com/
-


Re: solr 3.5 and indexing performance

2012-02-22 Thread Ahmet Arslan
> I wanted to switch to new version of solr, exactelly to 3.5
> but im getting
> big drop of indexing speed.

Could it be   configuration in solrconfig.xml?


SnapPull failed :org.apache.solr.common.SolrException: Error opening new searcher

2012-02-22 Thread eks dev
We started observing strange failures from ReplicationHandler when we
commit on master trunk version 4-5 days old.
It works sometimes, and sometimes not didn't dig deeper yet.

Looks like the real culprit hides behind:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed

Looks familiar to somebody?


120222 154959 SEVERE SnapPull failed
:org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1138)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1251)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1043)
at org.apache.solr.update.DirectUpdateHandler2.commit(Unknown Source)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:503)
at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:348)
at org.apache.solr.handler.ReplicationHandler.doFetch(Unknown Source)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:163)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:810)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:815)
at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:3984)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:254)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:233)
at 
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:223)
at 
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1095)
... 15 more


Re: Solr on netty

2012-02-22 Thread prasenjit mukherjee
Thanks for the response.

Yes we have 16 shards/partitions each on 16 different nodes and a
separate master Solr receiving continuous parallel requests from 10
client threads running on a single separate machine.
Our observation was that the perf degraded non linearly as the load (
no of concurrent clients ) increased.

Have some followup questions :

1.  What is the default maxnumber of threads configured when a Solr
instance make calls to other 16 partitions ?

2. How do I increase the max no of connections for solr<-->solr
interactions as u mentioned in your mail ?



On 2/22/12, Yonik Seeley  wrote:
> On Wed, Feb 22, 2012 at 9:27 AM, prasenjit mukherjee
>  wrote:
>> Is anybody aware of any effort regarding porting solr to a netty ( or
>> any other async-io based framework ) based framework.
>>
>> Even on medium load ( 10 parallel clients )  with 16 shards
>> performance seems to deteriorate quite sharply compared another
>> alternative ( async-io based ) solution as load increases.
>
> By "16 shards" do you mean you have 16 nodes and each single client
> request causes a distributed search across all of them them?  How many
> concurrent requests are your 10 clients making to each node?
>
> NIO works well when there are many clients, but when servicing those
> client requests only needs intermittent CPU.  That's not the pattern
> we see for search.
> You *can* easily configure Solr's Jetty to use NIO when accepting
> client connections, but it won't do you any good, just as switching to
> Netty wouldn't do anything here.
>
> Where NIO could help a little is with the requests that Solr makes to
> other Solr instances.  Solr is already architected for async
> request-response to other nodes, but the current underlying
> implementation uses HttpClient 3 (which doesn't have NIO).
>
> Anyway, it's unlikely that NIO vs BIO will make much of a difference
> with the numbers you're talking about (16 shards).
>
> Someone else reported that we have the number of connections per host
> set too low, and they saw big gains by increasing this.  There's an
> issue open to make this configurable in 3x:
> https://issues.apache.org/jira/browse/SOLR-3079
> We should probably up the max connections per host by default.
>
> -Yonik
> lucidimagination.com
>

-- 
Sent from my mobile device


RE: Unusually long data import time?

2012-02-22 Thread Devon Baumgarten
Oh sure! As best as I can, anyway.

I have not set the Java heap size, or really configured it at all. 

The server running both the SQL Server and Solr has:
* 2 Intel Xeon X5660 (each one is 2.8 GHz, 6 cores, 12 logical processors)
* 64 GB RAM
* One Solr instance (no shards)

I'm not using faceting.
My schema has these fields:
   
   
   
   
   
   
  

Custom types:

*LikeText
PatternReplaceCharFilterFactory ("\W+" => "")
KeywordTokenizerFactory 
StopFilterFactory (~40 words in stoplist)
ASCIIFoldingFilterFactory
LowerCaseFilterFactory
EdgeNGramFilterFactory
LengthFilterFactory (min:3, max:512)

*FuzzyText
PatternReplaceCharFilterFactory ("\W+" => "")
KeywordTokenizerFactory 
StopFilterFactory (~40 words in stoplist)
ASCIIFoldingFilterFactory
LowerCaseFilterFactory
NGramFilterFactory
LengthFilterFactory (min:3, max:512)

Devon Baumgarten


-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com] 
Sent: Wednesday, February 22, 2012 9:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Unusually long data import time?

Import times will depend on:
- hardware (speed of disks, cpu, # of cpus, amount of memory, etc)
- Java configuration (heap size, etc)
- Lucene/Solr configuration (many ...)
- Index configuration - how many fields, indexed how; faceting, etc
- OS configuration (this usually to a lesser degree; _usually_)
- Network issues if non-local
- DB configuration (driver, etc)

If you can give more information about the above, people on this list
should be able to better indicate whether 18 hours sounds right for
your situation.

-Glen Newton

On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten
 wrote:
> Hello,
>
> Would it be unusual for an import of 160 million documents to take 18 hours?  
> Each document is less than 1kb and I have the DataImportHandler using the 
> jdbc driver to connect to SQL Server 2008. The full-import query calls a 
> stored procedure that contains only a select from my target table.
>
> Is there any way I can speed this up? I saw recently someone on this list 
> suggested a new user could get all their Solr data imported in under an hour. 
> I sure hope that's true!
>
>
> Devon Baumgarten
>
>



-- 
-
http://zzzoot.blogspot.com/
-


Re: Fields, Facets, and Search Results

2012-02-22 Thread drLocke97
Hi darul,

You're right, I was not using . So, following your
suggestions, I added 

text

and

 

This required that I add a field "text", which is fine. I did that. Now,
when I commit the  for indexing, I get this error:

SOLR returned a #400 Error: Error adding field "section_text_content". . .

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fields-Facets-and-Search-Results-tp3765946p3767006.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Performance Improvement and degradation Help

2012-02-22 Thread naptowndev
As I've mentioned before, I'm very new to Solr.  I'm not a Java guy or an
Apache guy.  I'm a .Net guy.

We have a rather large schema - some 100 + fields plus a large number of
dynamic fields.

We've been trying to improve performance and finally got around to
implementing fastvectorhighlighting which gave us an immediate improvement
on the qtime (nearly 70%) which also improved the overall response time by
over 20%.

With that, we also bring back an extraordinarly large amount of data in the
XML. Some results (20 records) come back with a payload between 3MB and even
17MB.  We have a lot of report text that is used for searching and
highlighting.  We recently implemented field list wildcards on two versions
of Solr to test it out.  This allowed us to leave the report text off the
return and decreased the payload significantly - by nearly 85% in the large
cases...  

SO, we'd expect a performance boost there, however we are seeing greatly
increased response times on these builds of Solr even though the qtime is
incredibly fast.  

To put it in perspective - our original Solr core is 4.0, I believe the
4.0.0.2010.12.10.08.54.56 version.

On our test boxes, we have one running 4.0.0.2011.11.17 and one running
4.0.0.2012.02.16 version.

with the older version (not having the wildcard field list), it returns a
payload of approximately 13MB in an average of 1.5 seconds.  with the new
version (2012.02.16) which is on the same machines as the older version (so
network traffic/latency/hardware/etc are all the same), it's returning the
reduced payload (approximately 1.5MB in an average of 3.5-4 seconds).  I
will say that we reloaded the core once and briefly saw the 1.5MB payload
come back in 150-200 milliseconds, but within minutes we were back to the
3.5-4 seconds.  We also noticed the CPU was being pegged for seconds when
running the queries on the new build with the wildcard field list.

We have a lower scale box running the 2011.11.17 version and had more
success for a while.  We were getting the 150-200 ms response time on the
reduced payload for probably 30 minutes or so, and then it did the same
thing - bumped up to 3-4 seconds in response time.

Anyone have any experience with this type of random yet consistent
performance degradation or have insight as to what might be causing the
issues and how to fix them?

We'd love to not only have the performance boost from fast vector
highlighting, but also the decreased payload size.

Thanks in advance!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Performance-Improvement-and-degradation-Help-tp3767015p3767015.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to merge an "autofacet" with a predefined facet

2012-02-22 Thread Xavier
I'm not sure to understand your solution ?

When (and how) will be the 'word' detection in the fulltext ? before (by my
own) or during (with) solr indexation ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3767059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
I'm running into a problem with queries that contain forward slashes and more 
than one field.

For example, these queries work fine:
fieldName:/a
fieldName:/*

But if I have two fields with similar syntax in the same query, it fails.

For simplicity, I'm using the same field twice:

fieldName:/a fieldName:/a

results in: "no field name specified in query and no defaultSearchField defined 
in schema.xml"

SEVERE: org.apache.solr.common.SolrException: no field name specified in query 
and no defaultSearchField defined in schema.xml
at 
org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:106)
at 
org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:124)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1058)
at 
org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358)
at 
org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257)
at 
org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212)
at 
org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)


fieldName:/* fieldName:/*

results in: null

java.lang.NullPointerException
at 
org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(IndexSchema.java:747)
at 
org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1026)
at org.apache.solr.schema.IndexSchema.getFieldType(IndexSchema.java:980)
at 
org.apache.solr.search.SolrQueryParser.getWildcardQuery(SolrQueryParser.java:172)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1039)
at 
org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358)
at 
org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257)
at 
org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212)
at 
org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
at 
org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)


Any ideas as to what may be wrong and how can I make these work?

I'm on a 4.0 snapshot from Nov 29, 2011.


RE: Unusually long data import time?

2012-02-22 Thread Devon Baumgarten
I changed the heap size (Xmx1582m was as high as I could go). The import is at 
about 5% now, and from that I now estimate about 13 hours. It's hard to say 
though.. it keeps going up little by little.

If I get approval to use Solr for this project, I'll have them install a 64bit 
jvm instead, but is there anything else I can do?


Devon Baumgarten
Application Developer


-Original Message-
From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] 
Sent: Wednesday, February 22, 2012 10:32 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Unusually long data import time?

Oh sure! As best as I can, anyway.

I have not set the Java heap size, or really configured it at all. 

The server running both the SQL Server and Solr has:
* 2 Intel Xeon X5660 (each one is 2.8 GHz, 6 cores, 12 logical processors)
* 64 GB RAM
* One Solr instance (no shards)

I'm not using faceting.
My schema has these fields:
   
   
   
   
   
   
  

Custom types:

*LikeText
PatternReplaceCharFilterFactory ("\W+" => "")
KeywordTokenizerFactory 
StopFilterFactory (~40 words in stoplist)
ASCIIFoldingFilterFactory
LowerCaseFilterFactory
EdgeNGramFilterFactory
LengthFilterFactory (min:3, max:512)

*FuzzyText
PatternReplaceCharFilterFactory ("\W+" => "")
KeywordTokenizerFactory 
StopFilterFactory (~40 words in stoplist)
ASCIIFoldingFilterFactory
LowerCaseFilterFactory
NGramFilterFactory
LengthFilterFactory (min:3, max:512)

Devon Baumgarten


-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com] 
Sent: Wednesday, February 22, 2012 9:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Unusually long data import time?

Import times will depend on:
- hardware (speed of disks, cpu, # of cpus, amount of memory, etc)
- Java configuration (heap size, etc)
- Lucene/Solr configuration (many ...)
- Index configuration - how many fields, indexed how; faceting, etc
- OS configuration (this usually to a lesser degree; _usually_)
- Network issues if non-local
- DB configuration (driver, etc)

If you can give more information about the above, people on this list
should be able to better indicate whether 18 hours sounds right for
your situation.

-Glen Newton

On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten
 wrote:
> Hello,
>
> Would it be unusual for an import of 160 million documents to take 18 hours?  
> Each document is less than 1kb and I have the DataImportHandler using the 
> jdbc driver to connect to SQL Server 2008. The full-import query calls a 
> stored procedure that contains only a select from my target table.
>
> Is there any way I can speed this up? I saw recently someone on this list 
> suggested a new user could get all their Solr data imported in under an hour. 
> I sure hope that's true!
>
>
> Devon Baumgarten
>
>



-- 
-
http://zzzoot.blogspot.com/
-


How to check if a field is a multivalue field with java

2012-02-22 Thread tschiela
Hello,

is there any way to check, if a field of a SolrDocument ist a multivalue
field with java (solrj)?

Greets
Thomas

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-check-if-a-field-is-a-multivalue-field-with-java-tp3767200p3767200.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr & HBase - Re: How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Mr Gupta,

Thanks so much for your reply!

In my use cases, retrieving data by keyword is one of them. I think Solr is
a proper choice.

However, Solr does not provide a complex enough support to rank. And,
frequent updating is also not suitable in Solr. So it is difficult to
retrieve data randomly based on the values other than keyword frequency in
text. In this case, I attempt to use HBase.

But I don't know how HBase support high performance when it needs to keep
consistency in a large scale distributed system.

Now both of them are used in my system.

I will check out ElasticSearch.

Best regards,
Bing


On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta wrote:

> Bing,
> Its a classic battle on whether to use solr or hbase or a combination of
> both. both systems are very different but there is some overlap in the
> utility. they also differ vastly when it compares to computation power,
> storage needs, etc. so in the end, it all boils down to your use case. you
> need to pick the technology that it best suited to your needs.
> im still not clear on your use case though.
>
> btw, if you haven't started using solr yet - then you might want to
> checkout ElasticSearch. I spent over a week researching between solr and ES
> and eventually chose ES due to its cool merits.
>
> thanks
>
>
> On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu  wrote:
>
>> There is no secondary index support in HBase at the moment.
>>
>> It's on our road map.
>>
>> FYI
>>
>> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li  wrote:
>>
>> > Jacques,
>> >
>> > Yes. But I still have questions about that.
>> >
>> > In my system, when users search with a keyword arbitrarily, the query is
>> > forwarded to Solr. No any updating operations but appending new indexes
>> > exist in Solr managed data.
>> >
>> > When I need to retrieve data based on ranking values, HBase is used.
>> And,
>> > the ranking values need to be updated all the time.
>> >
>> > Is that correct?
>> >
>> > My question is that the performance must be low if keeping consistency
>> in a
>> > large scale distributed environment. How does HBase handle this issue?
>> >
>> > Thanks so much!
>> >
>> > Bing
>> >
>> >
>> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques  wrote:
>> >
>> > > It is highly unlikely that you could replace Solr with HBase.  They're
>> > > really apples and oranges.
>> > >
>> > >
>> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li  wrote:
>> > >
>> > >> Dear all,
>> > >>
>> > >> I wonder how data in HBase is indexed? Now Solr is used in my system
>> > >> because data is managed in inverted index. Such an index is suitable
>> to
>> > >> retrieve unstructured and huge amount of data. How does HBase deal
>> with
>> > >> the
>> > >> issue? May I replaced Solr with HBase?
>> > >>
>> > >> Thanks so much!
>> > >>
>> > >> Best regards,
>> > >> Bing
>> > >>
>> > >
>> > >
>> >
>>
>
>


Re: Unusually long data import time?

2012-02-22 Thread Walter Underwood
In my first try with the DIH, I had several sub-entities and it was making six 
queries per document. My 20M doc load was going to take many hours, most of a 
day. I re-wrote it to eliminate those, and now it makes a single query for the 
whole load and takes 70 minutes. These are small documents, just the metadata 
for each book.

wunder
Search Guy
Chegg

On Feb 22, 2012, at 9:41 AM, Devon Baumgarten wrote:

> I changed the heap size (Xmx1582m was as high as I could go). The import is 
> at about 5% now, and from that I now estimate about 13 hours. It's hard to 
> say though.. it keeps going up little by little.
> 
> If I get approval to use Solr for this project, I'll have them install a 
> 64bit jvm instead, but is there anything else I can do?
> 
> 
> Devon Baumgarten
> Application Developer
> 
> 
> -Original Message-
> From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] 
> Sent: Wednesday, February 22, 2012 10:32 AM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: Unusually long data import time?
> 
> Oh sure! As best as I can, anyway.
> 
> I have not set the Java heap size, or really configured it at all. 
> 
> The server running both the SQL Server and Solr has:
> * 2 Intel Xeon X5660 (each one is 2.8 GHz, 6 cores, 12 logical processors)
> * 64 GB RAM
> * One Solr instance (no shards)
> 
> I'm not using faceting.
> My schema has these fields:
>   
>   
>   
>   termVectors="true" /> 
>   termVectors="true" /> 
>   
>  
> 
> Custom types:
> 
> *LikeText
>   PatternReplaceCharFilterFactory ("\W+" => "")
>   KeywordTokenizerFactory 
>   StopFilterFactory (~40 words in stoplist)
>   ASCIIFoldingFilterFactory
>   LowerCaseFilterFactory
>   EdgeNGramFilterFactory
>   LengthFilterFactory (min:3, max:512)
> 
> *FuzzyText
>   PatternReplaceCharFilterFactory ("\W+" => "")
>   KeywordTokenizerFactory 
>   StopFilterFactory (~40 words in stoplist)
>   ASCIIFoldingFilterFactory
>   LowerCaseFilterFactory
>   NGramFilterFactory
>   LengthFilterFactory (min:3, max:512)
> 
> Devon Baumgarten
> 
> 
> -Original Message-
> From: Glen Newton [mailto:glen.new...@gmail.com] 
> Sent: Wednesday, February 22, 2012 9:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Unusually long data import time?
> 
> Import times will depend on:
> - hardware (speed of disks, cpu, # of cpus, amount of memory, etc)
> - Java configuration (heap size, etc)
> - Lucene/Solr configuration (many ...)
> - Index configuration - how many fields, indexed how; faceting, etc
> - OS configuration (this usually to a lesser degree; _usually_)
> - Network issues if non-local
> - DB configuration (driver, etc)
> 
> If you can give more information about the above, people on this list
> should be able to better indicate whether 18 hours sounds right for
> your situation.
> 
> -Glen Newton
> 
> On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten
>  wrote:
>> Hello,
>> 
>> Would it be unusual for an import of 160 million documents to take 18 hours? 
>>  Each document is less than 1kb and I have the DataImportHandler using the 
>> jdbc driver to connect to SQL Server 2008. The full-import query calls a 
>> stored procedure that contains only a select from my target table.
>> 
>> Is there any way I can speed this up? I saw recently someone on this list 
>> suggested a new user could get all their Solr data imported in under an 
>> hour. I sure hope that's true!
>> 
>> 
>> Devon Baumgarten
>> 
>> 
> 
> 
> 
> -- 
> -
> http://zzzoot.blogspot.com/
> -







Re: How to merge an "autofacet" with a predefined facet

2012-02-22 Thread Em
If you use the suggested solution, it will detect the words at indexing
time.
However, Solr's FilterFactory's lifecycle keeps no track on whether a
file for synonyms, keywords etc. has been changed since Solr's last startup.
Therefore a change within these files is not visible until you reload
your core.

Furthermore keywords for old documents aren't added automatically if you
change your keywords (and reload the core) - you have to write a routine
that finds documents matching the new keywords and reindex those documents.

Example:

Your keywordslist at time t1 contains two words:
keyword
codeword

You are indexing two documents:
doc1: {"content":"I am about a secret codeword."}
doc1: {"content":"Happy keyword and the gang."}

Your filter will mark "codeword" in doc1 and "keyword" in doc2 as words
to keep and remove everything else. Therefore their content for your
keepWordField contains only

doc1: {"indexedContent":"codeword"}
doc2: {"indexedContent":"keyword"}

However, if you add the word "gang" to your keywordlist AND reload your
SolrCore, doc2 will still only contain the term "keyword" until it gets
reindexed again.

Kind regards,
Em

Am 22.02.2012 17:56, schrieb Xavier:
> I'm not sure to understand your solution ?
> 
> When (and how) will be the 'word' detection in the fulltext ? before (by my
> own) or during (with) solr indexation ?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3767059.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: How to check if a field is a multivalue field with java

2012-02-22 Thread SUJIT PAL
Hi Thomas,

With Java (from within a custom handler in Solr) you can get a handle to the 
IndexSchema from the request, like so:

IndexSchema schema = req.getSchema();
SchemaField sf = schema.getField(fielaname);
boolean isMultiValued = sf.multiValued();

From within SolrJ code, you can use SolrDocument.getFieldValue() which returns 
an Object, so you could do an instanceof check - if its a Collection its 
multivalued, else not.

Object value = sdoc.getFieldValue(fieldname);
boolean isMultiValued = value instanceof Collection;

At least this is what I do, I don't think there is a way to get a handle to the 
IndexSchema object over solrj...

-sujit

On Feb 22, 2012, at 9:41 AM, tschiela wrote:

> Hello,
> 
> is there any way to check, if a field of a SolrDocument ist a multivalue
> field with java (solrj)?
> 
> Greets
> Thomas
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-check-if-a-field-is-a-multivalue-field-with-java-tp3767200p3767200.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to merge an "autofacet" with a predefined facet

2012-02-22 Thread Em
Btw.:
Solr has no downtime while reloading the core.
It loads the new core and while loading the new one it still serves
requests with the old one.
When the new one is ready (and warmed up) it finally replaces the old core.

Best,
Em

Am 22.02.2012 17:56, schrieb Xavier:
> I'm not sure to understand your solution ?
> 
> When (and how) will be the 'word' detection in the fulltext ? before (by my
> own) or during (with) solr indexation ?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-merge-an-autofacet-with-a-predefined-facet-tp3763988p3767059.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Em
Yury,

are you sure your request has a proper url-encoding?

Kind regards,
Em

Am 22.02.2012 18:25, schrieb Yury Kats:
> I'm running into a problem with queries that contain forward slashes and more 
> than one field.
> 
> For example, these queries work fine:
> fieldName:/a
> fieldName:/*
> 
> But if I have two fields with similar syntax in the same query, it fails.
> 
> For simplicity, I'm using the same field twice:
> 
> fieldName:/a fieldName:/a
> 
> results in: "no field name specified in query and no defaultSearchField 
> defined in schema.xml"
> 
> SEVERE: org.apache.solr.common.SolrException: no field name specified in 
> query and no defaultSearchField defined in schema.xml
>   at 
> org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:106)
>   at 
> org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:124)
>   at 
> org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1058)
>   at 
> org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358)
>   at 
> org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257)
>   at 
> org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212)
>   at 
> org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
>   at 
> org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118)
>   at 
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:143)
> 
> 
> fieldName:/* fieldName:/*
> 
> results in: null
> 
> java.lang.NullPointerException
>   at 
> org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(IndexSchema.java:747)
>   at 
> org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1026)
>   at org.apache.solr.schema.IndexSchema.getFieldType(IndexSchema.java:980)
>   at 
> org.apache.solr.search.SolrQueryParser.getWildcardQuery(SolrQueryParser.java:172)
>   at 
> org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1039)
>   at 
> org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358)
>   at 
> org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257)
>   at 
> org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212)
>   at 
> org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
>   at 
> org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118)
>   at 
> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74)
>   at org.apache.solr.search.QParser.getQuery(QParser.java:143)
> 
> 
> Any ideas as to what may be wrong and how can I make these work?
> 
> I'm on a 4.0 snapshot from Nov 29, 2011.
> 


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 12:25 PM, Yury Kats wrote:
> I'm running into a problem with queries that contain forward slashes and more 
> than one field.
> 
> For example, these queries work fine:
> fieldName:/a
> fieldName:/*
> 
> But if I have two fields with similar syntax in the same query, it fails.
> 
> For simplicity, I'm using the same field twice:
> 
> fieldName:/a fieldName:/a

Looks like escaping forward slashes makes the query work, eg
  fieldName:\/a fieldName:\/a

This is a bit puzzling as the forward slash is not part of the query language, 
is it?


RE: Unusually long data import time?

2012-02-22 Thread Devon Baumgarten
Walter,

Do you mean sub-entities in your database, or something else?

The data I am feeding DIH is from a select * (no joins or WHERE clause) on a 
table with:

int, int, varchar(32), varchar(32), varchar(512) (this one is the Name), 
varchar(512), datetime

If it matters, the select * is happening in a stored procedure.

Devon Baumgarten


-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Wednesday, February 22, 2012 11:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Unusually long data import time?

In my first try with the DIH, I had several sub-entities and it was making six 
queries per document. My 20M doc load was going to take many hours, most of a 
day. I re-wrote it to eliminate those, and now it makes a single query for the 
whole load and takes 70 minutes. These are small documents, just the metadata 
for each book.

wunder
Search Guy
Chegg

On Feb 22, 2012, at 9:41 AM, Devon Baumgarten wrote:

> I changed the heap size (Xmx1582m was as high as I could go). The import is 
> at about 5% now, and from that I now estimate about 13 hours. It's hard to 
> say though.. it keeps going up little by little.
> 
> If I get approval to use Solr for this project, I'll have them install a 
> 64bit jvm instead, but is there anything else I can do?
> 
> 
> Devon Baumgarten
> Application Developer
> 
> 
> -Original Message-
> From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] 
> Sent: Wednesday, February 22, 2012 10:32 AM
> To: 'solr-user@lucene.apache.org'
> Subject: RE: Unusually long data import time?
> 
> Oh sure! As best as I can, anyway.
> 
> I have not set the Java heap size, or really configured it at all. 
> 
> The server running both the SQL Server and Solr has:
> * 2 Intel Xeon X5660 (each one is 2.8 GHz, 6 cores, 12 logical processors)
> * 64 GB RAM
> * One Solr instance (no shards)
> 
> I'm not using faceting.
> My schema has these fields:
>   
>   
>   
>   termVectors="true" /> 
>   termVectors="true" /> 
>   
>  
> 
> Custom types:
> 
> *LikeText
>   PatternReplaceCharFilterFactory ("\W+" => "")
>   KeywordTokenizerFactory 
>   StopFilterFactory (~40 words in stoplist)
>   ASCIIFoldingFilterFactory
>   LowerCaseFilterFactory
>   EdgeNGramFilterFactory
>   LengthFilterFactory (min:3, max:512)
> 
> *FuzzyText
>   PatternReplaceCharFilterFactory ("\W+" => "")
>   KeywordTokenizerFactory 
>   StopFilterFactory (~40 words in stoplist)
>   ASCIIFoldingFilterFactory
>   LowerCaseFilterFactory
>   NGramFilterFactory
>   LengthFilterFactory (min:3, max:512)
> 
> Devon Baumgarten
> 
> 
> -Original Message-
> From: Glen Newton [mailto:glen.new...@gmail.com] 
> Sent: Wednesday, February 22, 2012 9:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Unusually long data import time?
> 
> Import times will depend on:
> - hardware (speed of disks, cpu, # of cpus, amount of memory, etc)
> - Java configuration (heap size, etc)
> - Lucene/Solr configuration (many ...)
> - Index configuration - how many fields, indexed how; faceting, etc
> - OS configuration (this usually to a lesser degree; _usually_)
> - Network issues if non-local
> - DB configuration (driver, etc)
> 
> If you can give more information about the above, people on this list
> should be able to better indicate whether 18 hours sounds right for
> your situation.
> 
> -Glen Newton
> 
> On Wed, Feb 22, 2012 at 10:14 AM, Devon Baumgarten
>  wrote:
>> Hello,
>> 
>> Would it be unusual for an import of 160 million documents to take 18 hours? 
>>  Each document is less than 1kb and I have the DataImportHandler using the 
>> jdbc driver to connect to SQL Server 2008. The full-import query calls a 
>> stored procedure that contains only a select from my target table.
>> 
>> Is there any way I can speed this up? I saw recently someone on this list 
>> suggested a new user could get all their Solr data imported in under an 
>> hour. I sure hope that's true!
>> 
>> 
>> Devon Baumgarten
>> 
>> 
> 
> 
> 
> -- 
> -
> http://zzzoot.blogspot.com/
> -







Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 1:05 PM, Em wrote:
> Yury,
> 
> are you sure your request has a proper url-encoding?

Yes


Re: Solr & HBase - Re: How is Data Indexed in HBase?

2012-02-22 Thread Jacques
>> Solr does not provide a complex enough support to rank.
I believe Solr has a bunch of plug-ability to write your own custom ranking
approach.  If you think you can't do your desired ranking with Solr, you're
probably wrong and need to ask for help from the Solr community.

>> retrieving data by keyword is one of them. I think Solr is a proper
choice
The key to keyword retrieval is the construction of the data.  Among other
things, this is one of the key things that Solr is very good at: creating a
very efficient organization of the data so that you can retrieve quickly.
 At their core, Solr, ElasticSearch, Lily and Katta all use Lucene to
construct this data.  HBase is bad at this.

>> how HBase support high performance when it needs to keep consistency in
a large scale distributed system
HBase is primarily built for retrieving a single row at a time based on a
predetermined and known location (the key).  It is also very efficient at
splitting massive datasets across multiple machines and allowing sequential
batch analyses of these datasets.  HBase can maintain high performance in
this way because consistency only ever exists at the row level.  This is
what HBase is good at.

You need to focus what you're doing and then write it out.  Figure out how
you think the pieces should work together.  Read the documentation.  Then,
ask specific questions where you feel like the documentation is unclear or
you feel confused.  Your general questions are very difficult to answer in
any kind of really helpful way.

thanks,
Jacques


On Wed, Feb 22, 2012 at 9:51 AM, Bing Li  wrote:

> Mr Gupta,
>
> Thanks so much for your reply!
>
> In my use cases, retrieving data by keyword is one of them. I think Solr
> is a proper choice.
>
> However, Solr does not provide a complex enough support to rank. And,
> frequent updating is also not suitable in Solr. So it is difficult to
> retrieve data randomly based on the values other than keyword frequency in
> text. In this case, I attempt to use HBase.
>
> But I don't know how HBase support high performance when it needs to keep
> consistency in a large scale distributed system.
>
> Now both of them are used in my system.
>
> I will check out ElasticSearch.
>
> Best regards,
> Bing
>
>
> On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta wrote:
>
>> Bing,
>> Its a classic battle on whether to use solr or hbase or a combination of
>> both. both systems are very different but there is some overlap in the
>> utility. they also differ vastly when it compares to computation power,
>> storage needs, etc. so in the end, it all boils down to your use case. you
>> need to pick the technology that it best suited to your needs.
>> im still not clear on your use case though.
>>
>> btw, if you haven't started using solr yet - then you might want to
>> checkout ElasticSearch. I spent over a week researching between solr and ES
>> and eventually chose ES due to its cool merits.
>>
>> thanks
>>
>>
>> On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu  wrote:
>>
>>> There is no secondary index support in HBase at the moment.
>>>
>>> It's on our road map.
>>>
>>> FYI
>>>
>>> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li  wrote:
>>>
>>> > Jacques,
>>> >
>>> > Yes. But I still have questions about that.
>>> >
>>> > In my system, when users search with a keyword arbitrarily, the query
>>> is
>>> > forwarded to Solr. No any updating operations but appending new indexes
>>> > exist in Solr managed data.
>>> >
>>> > When I need to retrieve data based on ranking values, HBase is used.
>>> And,
>>> > the ranking values need to be updated all the time.
>>> >
>>> > Is that correct?
>>> >
>>> > My question is that the performance must be low if keeping consistency
>>> in a
>>> > large scale distributed environment. How does HBase handle this issue?
>>> >
>>> > Thanks so much!
>>> >
>>> > Bing
>>> >
>>> >
>>> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques  wrote:
>>> >
>>> > > It is highly unlikely that you could replace Solr with HBase.
>>>  They're
>>> > > really apples and oranges.
>>> > >
>>> > >
>>> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li  wrote:
>>> > >
>>> > >> Dear all,
>>> > >>
>>> > >> I wonder how data in HBase is indexed? Now Solr is used in my system
>>> > >> because data is managed in inverted index. Such an index is
>>> suitable to
>>> > >> retrieve unstructured and huge amount of data. How does HBase deal
>>> with
>>> > >> the
>>> > >> issue? May I replaced Solr with HBase?
>>> > >>
>>> > >> Thanks so much!
>>> > >>
>>> > >> Best regards,
>>> > >> Bing
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yonik Seeley
2012/2/22 Yury Kats :
> On 2/22/2012 12:25 PM, Yury Kats wrote:
>> I'm running into a problem with queries that contain forward slashes and 
>> more than one field.
>>
>> For example, these queries work fine:
>> fieldName:/a
>> fieldName:/*
>>
>> But if I have two fields with similar syntax in the same query, it fails.
>>
>> For simplicity, I'm using the same field twice:
>>
>> fieldName:/a fieldName:/a
>
> Looks like escaping forward slashes makes the query work, eg
>  fieldName:\/a fieldName:\/a
>
> This is a bit puzzling as the forward slash is not part of the query 
> language, is it?

Regex queries were added that use forward slashes:

https://issues.apache.org/jira/browse/LUCENE-2604

-Yonik
lucidimagination.com


Re: Unusually long data import time?

2012-02-22 Thread Ahmet Arslan
> Would it be unusual for an import of 160 million documents
> to take 18 hours?  Each document is less than 1kb and I
> have the DataImportHandler using the jdbc driver to connect
> to SQL Server 2008. The full-import query calls a stored
> procedure that contains only a select from my target table.
> 
> Is there any way I can speed this up? I saw recently someone
> on this list suggested a new user could get all their Solr
> data imported in under an hour. I sure hope that's true!

Do have autoCommit or autoSoftCommit configured in solrconfig.xml?


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Em
That's strange.

Could you provide a sample dataset?

I'd like to try it out.

Kind regards,
Em

Am 22.02.2012 19:17, schrieb Yury Kats:
> On 2/22/2012 1:05 PM, Em wrote:
>> Yury,
>>
>> are you sure your request has a proper url-encoding?
> 
> Yes
> 


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 1:25 PM, Em wrote:
> That's strange.
> 
> Could you provide a sample dataset?

Data set does not matter. The query fails to parse, long before it gets to the 
data.


Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 1:24 PM, Yonik Seeley wrote:

>> This is a bit puzzling as the forward slash is not part of the query 
>> language, is it?
> 
> Regex queries were added that use forward slashes:
> 
> https://issues.apache.org/jira/browse/LUCENE-2604

Oh, so / is a special character now? I don't think it is mentioned as such on 
any of the wiki pages,
or in org.apache.solr.client.solrj.util.ClientUtils


RE: Unusually long data import time?

2012-02-22 Thread Devon Baumgarten
Ahmet,

I do not. I commented autoCommit out.

Devon Baumgarten



-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Wednesday, February 22, 2012 12:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Unusually long data import time?

> Would it be unusual for an import of 160 million documents
> to take 18 hours?  Each document is less than 1kb and I
> have the DataImportHandler using the jdbc driver to connect
> to SQL Server 2008. The full-import query calls a stored
> procedure that contains only a select from my target table.
> 
> Is there any way I can speed this up? I saw recently someone
> on this list suggested a new user could get all their Solr
> data imported in under an hour. I sure hope that's true!

Do have autoCommit or autoSoftCommit configured in solrconfig.xml? 


maxClauseCount error

2012-02-22 Thread Darren Govoni
Hi,
  I am suddenly getting a maxclause count error and don't know why. I am
using Solr 3.5





maxClauseCount Exception

2012-02-22 Thread Darren Govoni
Hi,
  I am suddenly getting a maxClauseCount exception for no reason. I am
using Solr 3.5. I have only 206 documents in my index.

Any ideas? This is wierd.

QUERY PARAMS: [hl, hl.snippets, hl.simple.pre, hl.simple.post, fl,
hl.mergeContiguous, hl.usePhraseHighlighter, hl.requireFieldMatch,
echoParams, hl.fl, q, rows, start]|#]


[#|2012-02-22T13:40:13.129-0500|INFO|glassfish3.1.1|
org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=Thread-2;|[]
webapp=/solr3 path=/select
params={hl=true&hl.snippets=4&hl.simple.pre=&fl=*,score&hl.mergeContiguous=true&hl.usePhraseHighlighter=true&hl.requireFieldMatch=true&echoParams=all&hl.fl=text_t&q={!lucene+q.op%3DOR+df%3Dtext_t}+(+kind_s:doc+OR+kind_s:xml)+AND+(type_s:[*+TO+*])+AND+(usergroup_sm:admin)&rows=20&start=0&wt=javabin&version=2}
 hits=204 status=500 QTime=166 |#]


[#|2012-02-22T13:40:13.131-0500|SEVERE|glassfish3.1.1|
org.apache.solr.servlet.SolrDispatchFilter|
_ThreadID=22;_ThreadName=Thread-2;|org.apache.lucene.search.BooleanQuery
$TooManyClauses: maxClauseCount is set to 1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:127)
at org.apache.lucene.search.ScoringRewrite
$1.addClause(ScoringRewrite.java:51)
at org.apache.lucene.search.ScoringRewrite
$1.addClause(ScoringRewrite.java:41)
at org.apache.lucene.search.ScoringRewrite
$3.collect(ScoringRewrite.java:95)
at
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:38)
at
org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:93)
at
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:304)
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158)
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:98)
at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:385)
at
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:217)
at
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:185)
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at org.apache.so



Re: Problem parsing queries with forward slashes and multiple fields

2012-02-22 Thread Yury Kats
On 2/22/2012 1:24 PM, Yonik Seeley wrote:
>> Looks like escaping forward slashes makes the query work, eg
>>  fieldName:\/a fieldName:\/a
>>
>> This is a bit puzzling as the forward slash is not part of the query 
>> language, is it?
> 
> Regex queries were added that use forward slashes:
> 
> https://issues.apache.org/jira/browse/LUCENE-2604

Looks like regex matching happens across multiple fields though. Feels like a 
bug to me?


Re: Unusually long data import time?

2012-02-22 Thread eks dev
Davon, you ought to try to update from many threads, (I do not know if
DIH can do it, check it), but lucene does great job if fed from many
update threads...

depends where your time gets lost, but it is usually a) analysis chain
or b) database

if it os a) and your server has spare cpu-cores, you can scale at X
NooCores rate

On Wed, Feb 22, 2012 at 7:41 PM, Devon Baumgarten
 wrote:
> Ahmet,
>
> I do not. I commented autoCommit out.
>
> Devon Baumgarten
>
>
>
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Wednesday, February 22, 2012 12:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Unusually long data import time?
>
>> Would it be unusual for an import of 160 million documents
>> to take 18 hours?  Each document is less than 1kb and I
>> have the DataImportHandler using the jdbc driver to connect
>> to SQL Server 2008. The full-import query calls a stored
>> procedure that contains only a select from my target table.
>>
>> Is there any way I can speed this up? I saw recently someone
>> on this list suggested a new user could get all their Solr
>> data imported in under an hour. I sure hope that's true!
>
> Do have autoCommit or autoSoftCommit configured in solrconfig.xml?


dih and solr cloud

2012-02-22 Thread eks dev
out of curiosity, trying to see if new cloud features can replace what
I use now...

how is this (batch) update forwarding solved at cloud level?

imagine simple one shard and one replica case, if I fire up DIH
update, is this going to be replicated to replica shard?
If yes,
- is it going to be sent document by document (network, imagine
100Mio+ update commands going to replica from slave for big batches)
- somehow batch into "packages" to reduce load
- distributed at index level somehow



This is important case, today with master/slave solr replication,  but
is not mentioned at  http://wiki.apache.org/solr/SolrCloud


Re: Fields, Facets, and Search Results

2012-02-22 Thread darul
Well, you probably need to clear you index first..remove index director,
restart your server and try again.
Let me know if it works or not.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fields-Facets-and-Search-Results-tp3765946p3767537.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fields, Facets, and Search Results

2012-02-22 Thread darul
And check your log file, you may have some errors at start of your server.

Due to some mistake, bad syntax in your schema file for example...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fields-Facets-and-Search-Results-tp3765946p3767569.html
Sent from the Solr - User mailing list archive at Nabble.com.


org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out

2012-02-22 Thread Uomesh
Hi,

I am getting below error while running delta import and my index is not
updated. Could you please let me know what might be causing this issue? I am
using Solr 3.5 version and around 60+ documents suppose to be updated using
delta import.


 [org.apache.solr.handler.dataimport.SolrWriter] - Error creating document :
SolrInputDocument[...]
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
NativeFSLock@/var/solr/data/5159200/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1108)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:171)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:219)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:636)
at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:303)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:179)
at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:390)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:429)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/org-apache-lucene-store-LockObtainFailedException-Lock-obtain-timed-out-tp3767605p3767605.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out

2012-02-22 Thread Sethi, Parampreet
Hi Uomesh,

I was facing similar issues few days ago and was able to resolve it by
deleting the lock file created in the index directory and restarting my
solr server.

I have documented the same in one of the posts at
http://www.params.me/2011/12/solr-index-lock-issue.html

Hope it helps!

-param

On 2/22/12 2:36 PM, "Uomesh"  wrote:

>Hi,
>
>I am getting below error while running delta import and my index is not
>updated. Could you please let me know what might be causing this issue? I
>am
>using Solr 3.5 version and around 60+ documents suppose to be updated
>using
>delta import.
>
>
> [org.apache.solr.handler.dataimport.SolrWriter] - Error creating
>document :
>SolrInputDocument[...]
>org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
>NativeFSLock@/var/solr/data/5159200/index/write.lock
>at org.apache.lucene.store.Lock.obtain(Lock.java:84)
>at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1108)
>at 
>org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:83)
>at
>org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.j
>ava:101)
>at
>org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler
>2.java:171)
>at
>org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.ja
>va:219)
>at
>org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdatePr
>ocessorFactory.java:61)
>at
>org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdatePr
>ocessorFactory.java:115)
>at 
>org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
>at
>org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHa
>ndler.java:293)
>at
>org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav
>a:636)
>at
>org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:303)
>at
>org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:179)
>at
>org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter
>.java:390)
>at
>org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4
>29)
>at
>org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:40
>8)
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/org-apache-lucene-store-LockObtainFaile
>dException-Lock-obtain-timed-out-tp3767605p3767605.html
>Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Performance Improvement and degradation Help

2012-02-22 Thread naptowndev
As an update to this... I tried running a query again the
4.0.0.2010.12.10.08.54.56 version and the newer 4.0.0.2012.02.16 (both on
the same box).  So the query params were the same, returned results were the
same, but the 4.0.0.2010.12.10.08.54.56 returned the results in about 1.6
seconds and the newer (4.0.0.2012.02.16) version returned the results in
about 4 seconds.

If I add the wildcard field list to the newer version, the time increases
anywhere from .5-1 second.  

These are all averages after running the queries several times over a 30
minute period. (allowing for warming and cache).

Anybody have any insight into why the newer versions are performing a bit
slower?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Performance-Improvement-and-degradation-Help-tp3767015p3767725.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 3.5 and indexing performance

2012-02-22 Thread mizayah
i got it all commnented in updateHandler, im prety sure there is no default
autocommit
 



iorixxx wrote
> 
>> I wanted to switch to new version of solr, exactelly to 3.5
>> but im getting
>> big drop of indexing speed.
> 
> Could it be   configuration in solrconfig.xml?
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-5-and-indexing-performance-tp3766653p3767843.html
Sent from the Solr - User mailing list archive at Nabble.com.


result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Naomi Dushay
I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   I 
have a test checking for a search result in Solr, and the test passes in Solr 
1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I just 
included output from lucene QueryParser to prove the document exists and is 
found 

I am completely stumped.


Here are the debugQuery details:

***Solr 3.5***

lucene QueryParser: 

URL:   q=all_search:"The Beatles as musicians : Revolver through the Anthology"
final query:  all_search:"the beatl as musician revolv through the antholog"

6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through the 
antholog" in 1064395), product of:
  1.0 = queryWeight(all_search:"the beatl as musician revolv through the 
antholog"), product of:
48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 
revolv=872 through=81366 the=3531140 antholog=11611)
0.02063975 = queryNorm
  6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through the 
antholog" in 1064395), product of:
1.0 = tf(phraseFreq=1.0)
48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 
revolv=872 through=81366 the=3531140 antholog=11611)
0.125 = fieldNorm(field=all_search, doc=1064395)

dismax QueryParser:   
URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
through the Anthology"
final query:   +(all_search:"the beatl as musician revolv through the 
antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
antholog"~3)~0.01

(no matches)


***Solr 1.4***

lucene QueryParser:   

URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
final query:  all_search:"the beatl as musician revolv through the antholog"

5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the 
antholog" in 3469163), product of:
  1.0 = tf(phraseFreq=1.0)
  48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
revolv=820 through=88238 the=3542123 antholog=11205)
  0.109375 = fieldNorm(field=all_search, doc=3469163)

dismax QueryParser:   
URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
through the Anthology"
final query:  +(all_search:"the beatl as musician revolv through the 
antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
antholog"~3)~0.01

score:

7.449651 = (MATCH) sum of:
  3.7248254 = weight(all_search:"the beatl as musician revolv through the 
antholog"~1 in 3469163), product of:
0.7071068 = queryWeight(all_search:"the beatl as musician revolv through 
the antholog"~1), product of:
  48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
revolv=820 through=88238 the=3542123 antholog=11205)
  0.014681898 = queryNorm
5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through 
the antholog" in 3469163), product of:
  1.0 = tf(phraseFreq=1.0)
  48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
revolv=820 through=88238 the=3542123 antholog=11205)
  0.109375 = fieldNorm(field=all_search, doc=3469163)
  3.7248254 = weight(all_search:"the beatl as musician revolv through the 
antholog"~3 in 3469163), product of:
0.7071068 = queryWeight(all_search:"the beatl as musician revolv through 
the antholog"~3), product of:
  48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
revolv=820 through=88238 the=3542123 antholog=11205)
  0.014681898 = queryNorm
5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through 
the antholog" in 3469163), product of:
  1.0 = tf(phraseFreq=1.0)
  48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
revolv=820 through=88238 the=3542123 antholog=11205)
  0.109375 = fieldNorm(field=all_search, doc=3469163)





RE: Unusually long data import time?

2012-02-22 Thread Devon Baumgarten
Thank you everyone for your patience and suggestions.

It turns out I was doing something really unreasonable in my schema. I 
mistakenly edited the max EdgeNgram size to 512, when I meant to set the 
lengthFilter max to 512. I brought this to a more reasonable number, and my 
estimated time to import is now down to 4 hours. Based on the size of my record 
set, this time is more consistent with Walter's observations in his own project.

Thanks again for your help,

Devon Baumgarten

-Original Message-
From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] 
Sent: Wednesday, February 22, 2012 12:42 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: Unusually long data import time?

Ahmet,

I do not. I commented autoCommit out.

Devon Baumgarten



-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Wednesday, February 22, 2012 12:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Unusually long data import time?

> Would it be unusual for an import of 160 million documents
> to take 18 hours?  Each document is less than 1kb and I
> have the DataImportHandler using the jdbc driver to connect
> to SQL Server 2008. The full-import query calls a stored
> procedure that contains only a select from my target table.
> 
> Is there any way I can speed this up? I saw recently someone
> on this list suggested a new user could get all their Solr
> data imported in under an hour. I sure hope that's true!

Do have autoCommit or autoSoftCommit configured in solrconfig.xml? 


Re: nutch and solr

2012-02-22 Thread alessio crisantemi
thanks for your reply, but don't work.
the same message: can't convert empty path

and more: impossible find class org.apache.nutch.crawl.injector

..


Il giorno 22 febbraio 2012 06:14, tamanjit.bin...@yahoo.co.in <
tamanjit.bin...@yahoo.co.in> ha scritto:

> Try this command.
>
>  bin/nutch crawl urls//.txt -dir crawl/ name>
> -threads 10 -depth 2 -topN 1000
>
> Your folder structure will look like this:
>
> -- urls -- -- .txt
>|
>|
> -- crawl -- 
>
> The folder name will be for different domains. So for each domain folder in
> urls folder there has to be a corresponding folder (with the same name) in
> the crawl folder.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3765607.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: 'location' fieldType indexation impossible

2012-02-22 Thread Erick Erickson
Make sure that your schema file is exactly the same on both
your local server and the remote server. Especially there should
be a dynamic field definition like:


and you should see a couple of fields appear like
emploi_city_geoloc_0_coordinate and
emploi_city_geoloc_1_coordinate
when you index a "location" type in the field you indicated.

This has tripped me up in the past.

If that doesn't apply, then you need to provide more information,
more of the stack trace, what you've tried etc.

Because saying:
"I really don't understand why it isnt working because, it was working on my
local server with the same configuration (Solr 3.5.0) and the same database
!!!"

Is another way of saying "Something's different between
the two versions, I just don't know what yet" ...

So I'd start (make a backup first) by just copying my entire configuration
from my local machine to the remote one, restarting Solr and trying
again.

Best
Erick

On Wed, Feb 22, 2012 at 5:53 AM, Xavier  wrote:
> Hi,
>
> When i try to index my location field i get this error for each documents :
> *ATTENTION: Error creating document  Error adding field
> 'emploi_city_geoloc'='48.85,2.5525' *
> (so i have 0 files indexed)
>
> Here is my schema.xml :
> * stored="false"/>*
>
> I really don't understand why it isnt working because, it was working on my
> local server with the same configuration (Solr 3.5.0) and the same database
> !!!
>
> If i try to use "geohash" instead of "location" it is working for
> indexation, but my geodist query in front isnt working anymore ...
>
> Any ideas ?
>
> Best regards,
> Xavier
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/location-fieldType-indexation-impossible-tp3766136p3766136.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Same id on two shards

2012-02-22 Thread jerry.min...@gmail.com
Hi,

I stumbled across this thread after running into the same question. The
answers presented here seem a little vague and I was hoping to renew the
discussion.

I am using using a branch of Solr 4, distributed searching over 12 shards.
I want the documents in the first shard to always be selected over
documents that appear in the other 11 shards.

The queries to these shards looks something like this: "
http://solrserver/shard_1_app/select?shards=solr_server:/shard_1_app/,solr_server:/shard_2_app,
... ,solr_server:/shard_12_app&q=id:"

When I execute a query for an ID that I know exists in shard_1 and another
shard, I do always get the result from shard 1.

Here's some questions that I have:
1. Has anyone rigorously tested the comment in the wiki "If docs with
duplicate unique keys are encountered, Solr will make an attempt to return
valid results, but the behavior may be non-deterministic."

2. Who is relying on this behavior (the document of the first shard is
returned) today? When do you notice the wrong document is selected? Do you
have a feeling for how frequently your distributed search returns the
document from a shard other than the first?

3. Is there a good web source other than the Solr wiki for information
about Solr distributed queries?


Thanks,
Jerry M.


On Mon, Aug 8, 2011 at 7:41 PM, simon  wrote:

> I think the first one to respond is indeed the way it works, but
> that's only deterministic up to a point (if your small index is in the
> throes of a commit and everything required for a response happens to
> be  cached on the larger shard ... who knows ?)
>
> On Mon, Aug 8, 2011 at 7:10 PM, Shawn Heisey  wrote:
> > On 8/8/2011 4:07 PM, simon wrote:
> >>
> >> Only one should be returned, but it's non-deterministic. See
> >>
> >>
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations
> >
> > I had heard it was based on which one responded first.  This is part of
> why
> > we have a small index that contains the newest content and only
> distribute
> > content to the other shards once a day.  The hope is that the small index
> > (less than 1GB, fits into RAM on that virtual machine) will always
> respond
> > faster than the other larger shards (over 18GB each).  Is this an
> incorrect
> > assumption on our part?
> >
> > The build system does do everything it can to ensure that periods of
> overlap
> > are limited to the time it takes to commit a change across all of the
> > shards, which should amount to just a few seconds once a day.  There
> might
> > be situations when the index gets out of whack and we have duplicate id
> > values for a longer time period, but in practice it hasn't happened yet.
> >
> > Thanks,
> > Shawn
> >
> >
>


need to support bi-directional synonyms

2012-02-22 Thread geeky2
hello all,

i need to support the following:

if the user enters "sprayer" in the desc field - then they get results for
BOTH "sprayer" and "washer".

and in the other direction

if the user enters "washer" in the desc field - then they get results for
BOTH "washer" and "sprayer". 

would i set up my synonym file like this?

assuming expand = true..

sprayer => washer
washer => sprayer

thank you,
mark

--
View this message in context: 
http://lucene.472066.n3.nabble.com/need-to-support-bi-directional-synonyms-tp3767990p3767990.html
Sent from the Solr - User mailing list archive at Nabble.com.


Trunk build errors

2012-02-22 Thread Darren Govoni
Hi,
  I am getting numerous errors preventing a build of solrcloud trunk.

 [licenses] MISSING LICENSE for the following file:


Any tips to get a clean build working?

thanks




Re: Fast Vector Highlighter Working for some records only

2012-02-22 Thread Koji Sekiguchi

Hi dhaivat,

I think you may want to use analysis.jsp:

http://localhost:8983/solr/admin/analysis.jsp

Go to the URL and look into how your custom tokenizer produces tokens,
and compare with the output of Solr's inbuilt tokenizer.

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/


(12/02/22 21:35), dhaivat wrote:


Koji Sekiguchi wrote


(12/02/22 11:58), dhaivat wrote:

Thanks for reply,

But can you please tell me why it's working for some documents and not
for
other.


As Solr 1.4.1 cannot recognize hl.useFastVectorHighlighter flag, Solr just
ignore it, but due to hl=true is there, Solr tries to create highlight
snippets
by using (existing; traditional; I mean not FVH) Highlighter.
Highlighter (including FVH) cannot produce snippets sometime for some
reasons,
you can use hl.alternateField parameter.

http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/



Thank you so much explanation,

I have updated my solr version and using 3.5, Could you please tell me when
i am using custom Tokenizer on the field,so do i need to make any changes
related Solr highlighter.

here is my custom analyser

  
   
 
   




 

here is the field info:



i am creating tokens using my custom analyser and when i am trying to use
highlighter it's not working properly for contents field.. but when i tried
to use Solr inbuilt tokeniser i am finding the word highlighted for
particular query.. Please can you help me out with this ?


Thanks in advance
Dhaivat





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fast-Vector-Highlighter-Working-for-some-records-only-tp3763286p3766335.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Jonathan Rochkind
So I don't really know what I'm talking about, and I'm not really sure 
if it's related or not, but your particular query:


"The Beatles as musicians : Revolver through the Anthology"

With the lone "word" that's a ':', reminds me of a dismax stopwords-type 
problem I ran into. Now, I ran into it on 1.4.  I don't know why it 
would be different on 1.4 and 3.x. And I see you aren't even using a 
multi-field dismax in your sample query, so it couldn't possibly be what 
I ran into... I don't think. But I'll write this anyway in case it gives 
someone some ideas.


The problem I ran into is caused by different analysis in two fields 
both used in a dismax, one that ends up keeping ":" as a token, and one 
that doesn't.  Which ends up having the same effect as the famous 
'dismax stopwords problem'.


Maybe somehow your schema changed such to produce this problem in 3.x 
but not in 1.4? Although again I realize the fact that you are only 
using a single field in your demo dismax query kind of suggests it's not 
this problem. Wonder if you try the query without the ":", if the 
problem goes away, that might be a hint. Or, maybe someone more skilled 
at understanding what's in those Solr debug statements than I am (it's 
kind of all greek to me) will be able to take this hint and rule out or 
confirm that it may have something to do with your problem.


Here I write up the issue I ran into (which may or may not have anything 
to do with what you ran into)


http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/


Also, you don't say what your 'mm' is in your dismax queries, that could 
be relevant if it's got anything to do with anything similar to the 
issue I'm talking about.


Hmm, I wonder if Solr 3.x changes the way dismax calculates number of 
tokens for 'mm' in such a way that the 'varying field analysis dismax 
gotcha' can manifest with only one field, if the way dismax counts 
tokens for 'mm' differs from number of tokens the single field's 
analysis produces?


Jonathan

On 2/22/2012 2:55 PM, Naomi Dushay wrote:

I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   I 
have a test checking for a search result in Solr, and the test passes in Solr 
1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I just 
included output from lucene QueryParser to prove the document exists and is 
found

I am completely stumped.


Here are the debugQuery details:

***Solr 3.5***

lucene QueryParser:

URL:   q=all_search:"The Beatles as musicians : Revolver through the Anthology"
final query:  all_search:"the beatl as musician revolv through the antholog"

6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through the 
antholog" in 1064395), product of:
   1.0 = queryWeight(all_search:"the beatl as musician revolv through the 
antholog"), product of:
 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 
revolv=872 through=81366 the=3531140 antholog=11611)
 0.02063975 = queryNorm
   6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through the 
antholog" in 1064395), product of:
 1.0 = tf(phraseFreq=1.0)
 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 musician=11805 
revolv=872 through=81366 the=3531140 antholog=11611)
 0.125 = fieldNorm(field=all_search, doc=1064395)

dismax QueryParser:
URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the 
Anthology"
final query:   +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 
(all_search:"the beatl as musician revolv through the antholog"~3)~0.01

(no matches)


***Solr 1.4***

lucene QueryParser:

URL:  q=all_search:"The Beatles as musicians : Revolver through the Anthology"
final query:  all_search:"the beatl as musician revolv through the antholog"

5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the 
antholog" in 3469163), product of:
   1.0 = tf(phraseFreq=1.0)
   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 musician=11955 
revolv=820 through=88238 the=3542123 antholog=11205)
   0.109375 = fieldNorm(field=all_search, doc=3469163)

dismax QueryParser:
URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver through the 
Anthology"
final query:  +(all_search:"the beatl as musician revolv through the antholog"~1)~0.01 
(all_search:"the beatl as musician revolv through the antholog"~3)~0.01

score:

7.449651 = (MATCH) sum of:
   3.7248254 = weight(all_search:"the beatl as musician revolv through the 
antholog"~1 in 3469163), product of:
 0.7071068 = queryWeight(all_search:"the beatl as musician revolv through the 
antholog"~1), product of:
   48.16181 = idf(all_search: the=3542123 beatl=391 as=749890 
musician=11955 revolv=820 through=88238 the=3542123 antholog=11205)
   0.014681898 = queryNorm
 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the 
antholog" in 3469163), pr

Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Naomi Dushay
I forgot to include the field definition information:

schema.xml:
  

solr 3.5:
  
  

  



  


solr1.4:

  






  



And the analysis page shows the same results for Solr 3.5 and 1.4


Solr 3.5:

position1   2   3   4   5   6   7   8
term text   the beatl   as  musicianrevolv  through the 
antholog
keyword false   false   false   false   false   false   false   false
startOffset 0   4   12  15  27  36  44  48
endOffset   3   11  14  24  35  43  47  57
typewordwordwordwordwordwordwordword

Solr 1.4:

term position   1   2   3   4   5   6   7   8
term text   the beatl   as  musicianrevolv  through the 
antholog
term type   wordwordwordwordwordwordwordword
source start,end0,3 4,1112,14   15,24   27,35   36,43   44,47   
48,57

- Naomi

--
View this message in context: 
http://lucene.472066.n3.nabble.com/result-present-in-Solr-1-4-but-missing-in-Solr-3-5-dismax-only-tp3767851p3768007.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: String search in Dismax handler

2012-02-22 Thread Erick Erickson
Two things:
1> what version of Solr are you using? qt=dismax isn't going to any request
handler I don't think.

2> what do you get when you add &debugQuery=on? Try that with
both results and perhaps that will shed some light. If not, can you
post the results?

Best
Erick

On Wed, Feb 22, 2012 at 7:47 AM, mechravi25  wrote:
> Hi,
>
> The string I am searching is "Pass By Value".  I am using the qt=dismax (in
> the request query) as well.
>
>
> When I search the above string with the double quotes, the data is getting
> fetched
> but the same query string without any double quotes gives no results.
>
> Following is the dismax request handler in the solrconfig.xml
>
>
> 
>    
>     explicit
>
>     
>        id,score
>     
>
>     *:*
>
>
>     0
>
>     name
>     regex
>    
>  
>
>
>
> The same query string works fine with and without double quotes when I use
> default request handler
>
>
> Following is the default request handler in the solrconfig.xml
>
>  default="true">
>
>     
>       explicit
>
>     
>  
>
>
> Please provide some suggestions as to why the string search without quotes
> is returning no records
> when dismax handler is used. Am I missing out on something?
>
> Thanks.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/String-search-in-Dismax-handler-tp3766360p3766360.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Naomi Dushay
Jonathan,

I have the same problem without the colon - I tested that, but didn't mention 
it.   

mm can't be the issue either:   in Solr 3.5, if I remove one of the occurrences 
of "the"  (doesn't matter which), I get results.  Removing any other word does 
NOT get results.   And if the query isn't a phrase query, it gets results.

And no, it can't be related to what you refer to as the  "dismax stopwords 
problem", since i can demonstrate the problem with a single field.  mm can't be 
the issue 


I have run into problems in the past with a non-alpha character surrounded by 
spaces tanking my search results for dismax … but I fixed that with this 
fieldType:






  


  
  




  


My stopwords_punctuation.txt file is

#Punctuation characters we want to ignore in queries
:
;
&
/

and used this type instead of string for fields in my dismax qf.Thus, the 
punctuation "terms" in the query are not present for the fields that were 
formerly string fields.

- Naomi

On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote:

> So I don't really know what I'm talking about, and I'm not really sure if 
> it's related or not, but your particular query:
> 
> "The Beatles as musicians : Revolver through the Anthology"
> 
> With the lone "word" that's a ':', reminds me of a dismax stopwords-type 
> problem I ran into. Now, I ran into it on 1.4.  I don't know why it would be 
> different on 1.4 and 3.x. And I see you aren't even using a multi-field 
> dismax in your sample query, so it couldn't possibly be what I ran into... I 
> don't think. But I'll write this anyway in case it gives someone some ideas.
> 
> The problem I ran into is caused by different analysis in two fields both 
> used in a dismax, one that ends up keeping ":" as a token, and one that 
> doesn't.  Which ends up having the same effect as the famous 'dismax 
> stopwords problem'.
> 
> Maybe somehow your schema changed such to produce this problem in 3.x but not 
> in 1.4? Although again I realize the fact that you are only using a single 
> field in your demo dismax query kind of suggests it's not this problem. 
> Wonder if you try the query without the ":", if the problem goes away, that 
> might be a hint. Or, maybe someone more skilled at understanding what's in 
> those Solr debug statements than I am (it's kind of all greek to me) will be 
> able to take this hint and rule out or confirm that it may have something to 
> do with your problem.
> 
> Here I write up the issue I ran into (which may or may not have anything to 
> do with what you ran into)
> 
> http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
> 
> 
> Also, you don't say what your 'mm' is in your dismax queries, that could be 
> relevant if it's got anything to do with anything similar to the issue I'm 
> talking about.
> 
> Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens 
> for 'mm' in such a way that the 'varying field analysis dismax gotcha' can 
> manifest with only one field, if the way dismax counts tokens for 'mm' 
> differs from number of tokens the single field's analysis produces?
> 
> Jonathan
> 
> On 2/22/2012 2:55 PM, Naomi Dushay wrote:
>> I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   
>> I have a test checking for a search result in Solr, and the test passes in 
>> Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I 
>> just included output from lucene QueryParser to prove the document exists 
>> and is found
>> 
>> I am completely stumped.
>> 
>> 
>> Here are the debugQuery details:
>> 
>> ***Solr 3.5***
>> 
>> lucene QueryParser:
>> 
>> URL:   q=all_search:"The Beatles as musicians : Revolver through the 
>> Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through 
>> the antholog" in 1064395), product of:
>>   1.0 = queryWeight(all_search:"the beatl as musician revolv through the 
>> antholog"), product of:
>> 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
>> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>> 0.02063975 = queryNorm
>>   6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through 
>> the antholog" in 1064395), product of:
>> 1.0 = tf(phraseFreq=1.0)
>> 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
>> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>> 0.125 = fieldNorm(field=all_search, doc=1064395)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
>> through the Anthology"
>> final query:   +(all_search:"the beatl as musician revolv through the 
>> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
>> antholog"~3)~0.01
>> 
>> (no matches)
>> 
>> 
>> **

Re: SnapPull failed :org.apache.solr.common.SolrException: Error opening new searcher

2012-02-22 Thread Mark Miller
Looks like an issue around replication IndexWriter reboot, soft commits and 
hard commits.

I think I've got a workaround for it:

Index: solr/core/src/java/org/apache/solr/handler/SnapPuller.java
===
--- solr/core/src/java/org/apache/solr/handler/SnapPuller.java  (revision 
1292344)
+++ solr/core/src/java/org/apache/solr/handler/SnapPuller.java  (working copy)
@@ -499,6 +499,17 @@
   
   // reboot the writer on the new index and get a new searcher
   solrCore.getUpdateHandler().newIndexWriter();
+  Future[] waitSearcher = new Future[1];
+  solrCore.getSearcher(true, false, waitSearcher, true);
+  if (waitSearcher[0] != null) {
+try {
+ waitSearcher[0].get();
+   } catch (InterruptedException e) {
+ SolrException.log(LOG,e);
+   } catch (ExecutionException e) {
+ SolrException.log(LOG,e);
+   }
+ }
   // update our commit point to the right dir
   solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req, false));
 
That should allow the searcher that the following commit command prompts to see 
the *new* IndexWriter.

On Feb 22, 2012, at 10:56 AM, eks dev wrote:

> We started observing strange failures from ReplicationHandler when we
> commit on master trunk version 4-5 days old.
> It works sometimes, and sometimes not didn't dig deeper yet.
> 
> Looks like the real culprit hides behind:
> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
> 
> Looks familiar to somebody?
> 
> 
> 120222 154959 SEVERE SnapPull failed
> :org.apache.solr.common.SolrException: Error opening new searcher
>at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1138)
>at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1251)
>at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1043)
>at org.apache.solr.update.DirectUpdateHandler2.commit(Unknown Source)
>at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:503)
>at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:348)
>at org.apache.solr.handler.ReplicationHandler.doFetch(Unknown Source)
>at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:163)
>at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.lucene.store.AlreadyClosedException: this
> IndexWriter is closed
>at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:810)
>at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:815)
>at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:3984)
>at 
> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:254)
>at 
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:233)
>at 
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:223)
>at 
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
>at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1095)
>... 15 more

- Mark Miller
lucidimagination.com













Re: Solr Highlighting not working with PayloadTermQueries

2012-02-22 Thread Koji Sekiguchi

(12/02/22 7:53), Nitin Arora wrote:

Hi,

I'm using SOLR and Lucene in my application for search.

I'm facing an issue of highlighting using FastVectorHighlighter not working
when I use PayloadTermQueries as clauses of a BooleanQuery.

After Debugging I found that In DefaultSolrHighlighter.Java,
fvh.getFieldQuery does not return any term in the termMap.

FastVectorHighlighter fvh = new FastVectorHighlighter(
 // FVH cannot process hl.usePhraseHighlighter parameter per-field
basis
 params.getBool( HighlightParams.USE_PHRASE_HIGHLIGHTER, true ),
 // FVH cannot process hl.requireFieldMatch parameter per-field basis
 params.getBool( HighlightParams.FIELD_MATCH, false ) );

FieldQuery fieldQuery = fvh.getFieldQuery( query );

The reason of empty termmap is, PayloadTermQuery is discarded while
constructing the FieldQuery.

void flatten( Query sourceQuery, Collection  flatQueries ){
 if( sourceQuery instanceof BooleanQuery ){
   BooleanQuery bq = (BooleanQuery)sourceQuery;
   for( BooleanClause clause : bq.getClauses() ){
 if( !clause.isProhibited() )
   flatten( clause.getQuery(), flatQueries );
   }
 }
 else if( sourceQuery instanceof DisjunctionMaxQuery ){
   DisjunctionMaxQuery dmq = (DisjunctionMaxQuery)sourceQuery;
   for( Query query : dmq ){
 flatten( query, flatQueries );
   }
 }
 else if( sourceQuery instanceof TermQuery ){
   if( !flatQueries.contains( sourceQuery ) )
 flatQueries.add( sourceQuery );
 }
 else if( sourceQuery instanceof PhraseQuery ){
   if( !flatQueries.contains( sourceQuery ) ){
 PhraseQuery pq = (PhraseQuery)sourceQuery;
 if( pq.getTerms().length>  1 )
   flatQueries.add( pq );
 else if( pq.getTerms().length == 1 ){
   flatQueries.add( new TermQuery( pq.getTerms()[0] ) );
 }
   }
 }
 // else discard queries
   }

What is the best way to get highlighting working with Payload Term Queries?


Hi Nitin,

Thank you for reporting this problem! Your assumption is correct.
FVH discards PayloadTermQueries in flatten() method.

Though I'm not familiar with SpanQueries so much, but looks like SpanTermQuery 
which is
the super class of PayloadTermQuery, has getTerm() method. Do you think if 
flatten()
can recognize SpanTermQuery and then add the term to flatQueries, it solves 
your problem?

If so, please open a jira ticket. And if you can, attach a patch would help a 
lot!

koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/


Do nested entities have a representation in Solr indexes?

2012-02-22 Thread Mike O'Leary
The data-config.xml file that I have for indexing database contents has nested 
entity nodes within a document node, and each of the entities contains field 
nodes. Lucene indexes consist of documents that contain fields. What about 
entities? If you change the way entities are structured in a data-config.xml 
file, in what way (if any) does it change how the contents are stored in the 
index. When I created the entities I am using, and defined the fields in one of 
the inner entities to be multivalued, I thought that the fields of that entity 
type would be grouped logically somehow in the index, but then I remembered 
that Lucene doesn't have a concept of sub-documents (that I know of), so each 
of the field values will be added to a list, and the extent of the logical 
grouping would be that the field values that were indexed together would be at 
the same position in their respective lists. Am I understanding this right, or 
do entities as defined in data-config.xml have some kind of representation in 
the index like document and field do?
Thanks,
Mike


RE: Recovering from database connection resets in DataimportHandler

2012-02-22 Thread Mike O'Leary
Could you point me to the most non-intimidating introduction to SolrJ that you 
know of? I have a passing familiarity with Javascript and, with few exceptions, 
I haven't developing software that has a graphical user interface of any kind 
in about 25 years. I like the idea of having finer control over data imported 
from a database though.
Thanks,
Mike

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, February 13, 2012 6:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Recovering from database connection resets in DataimportHandler

I'd seriously consider using SolrJ and your favorite JDBC driver instead. It's 
actually quite easy to create one, although as always it may be a bit 
intimidating to get started. This allows you much finer control over error  
conditions than DIH does, so may be more suited to your needs.

Best
Erick

On Sat, Feb 11, 2012 at 2:40 AM, Mike O'Leary  wrote:
> I am trying to use Solr's DataImportHandler to index a large number of 
> database records in a SQL Server database that is owned and managed by a 
> group we are collaborating with. The indexing jobs I have run so far, except 
> for the initial very small test runs, have failed due to database connection 
> resets. I have gotten indexing jobs to go further by using 
> CachedSqlEntityProcessor and specifying responseBuffering=adaptive in the 
> connection url, but I think in order to index that data I'm going to have to 
> work out how to catch database connection reset exceptions and resubmit the 
> queries that failed. Can anyone can suggest a good way to approach this? Or 
> have any of you encountered this problem and worked out a solution to it 
> already?
> Thanks,
> Mike


Re: distributed deletes working?

2012-02-22 Thread Jamie Johnson
I know everyone is busy, but I was wondering if anyone had found
anything with this?  Any suggestions on what I could be doing wrong
would be greatly appreciated.

On Fri, Feb 17, 2012 at 4:08 PM, Mark Miller  wrote:
>
> On Feb 17, 2012, at 3:56 PM, Jamie Johnson wrote:
>
>> id field is a UUID.
>
> Strange - was using UUID's myself in same test this morning...
>
> I'll try again soon.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Is there a way to write a DataImportHandler deltaQuery that compares contents still to be imported to contents in the index?

2012-02-22 Thread Mike O'Leary
I am working on indexing the contents of a database that I don't have 
permission to alter. In particular, the DataImportHandler examples that show 
how to specify a deltaQuery attribute value show database tables that have a 
last_modified column, and it compares these values with last_index_time values 
stored in the dataimport.properties file. The tables in the database I am 
working with don't have anything like a last_modified column. An indexing job I 
was running yesterday failed, and I would like to restart it so that it only 
imports the data that it hasn't already indexed. As a one-off, I could create a 
list of the keys of the database records that have been indexed and hack in 
something that reads that list as part of how it figures out what to index, but 
I was wondering if there is something built in that would allow me to do the 
same kind of comparison in a likely far more elegant way. What kinds of 
information do the deltaQuery attributes have access to, apart from the 
database tables, columns, etc., and do they have access to any information that 
would help me with what I want to do?
Thanks,
Mike

P.S. While we're on the subject of delta... attributes, can someone explain to 
me what the difference is between the deltaQuery and the deltaImportQuery 
attributes?


Re: distributed deletes working?

2012-02-22 Thread Mark Miller
Yonik did fix an issue around peer sync and deletes a few days ago - long 
chance that was involved?

Otherwise, neither Sami nor I have replicated these results so far.

On Feb 22, 2012, at 8:56 PM, Jamie Johnson wrote:

> I know everyone is busy, but I was wondering if anyone had found
> anything with this?  Any suggestions on what I could be doing wrong
> would be greatly appreciated.
> 
> On Fri, Feb 17, 2012 at 4:08 PM, Mark Miller  wrote:
>> 
>> On Feb 17, 2012, at 3:56 PM, Jamie Johnson wrote:
>> 
>>> id field is a UUID.
>> 
>> Strange - was using UUID's myself in same test this morning...
>> 
>> I'll try again soon.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: distributed deletes working?

2012-02-22 Thread Jamie Johnson
Perhaps if you could give me the steps you're using to test I can find
an error in what I'm doing.


On Wed, Feb 22, 2012 at 9:24 PM, Mark Miller  wrote:
> Yonik did fix an issue around peer sync and deletes a few days ago - long 
> chance that was involved?
>
> Otherwise, neither Sami nor I have replicated these results so far.
>
> On Feb 22, 2012, at 8:56 PM, Jamie Johnson wrote:
>
>> I know everyone is busy, but I was wondering if anyone had found
>> anything with this?  Any suggestions on what I could be doing wrong
>> would be greatly appreciated.
>>
>> On Fri, Feb 17, 2012 at 4:08 PM, Mark Miller  wrote:
>>>
>>> On Feb 17, 2012, at 3:56 PM, Jamie Johnson wrote:
>>>
 id field is a UUID.
>>>
>>> Strange - was using UUID's myself in same test this morning...
>>>
>>> I'll try again soon.
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Naomi Dushay
Jonathan has brought it to my attention that BOTH of my failing searches happen 
to have 8 terms, and one of the terms is repeated:

 "The Beatles as musicians : Revolver through the Anthology"
 "Color-blindness [print/digital]; its dangers and its detection"

but this is a PHRASE search.  

In case it's relevant, both Solr 1.4 and Solr 3.5:
 do NOT use stopwords in the fieldtype;  
 mm is  6<-1 6<90%  for dismax
 qs is 1
 ps is 3

And both use this filter last



… but I believe that filter is only used for consecutive tokens.

Lastly, 

 "Color-blindness [print/digital]; its and its detection"   works   ("danger" 
is removed, rather than one of the repeated "its")

- Naomi



On Feb 22, 2012, at 3:41 PM, Jonathan Rochkind wrote:

> So I don't really know what I'm talking about, and I'm not really sure if 
> it's related or not, but your particular query:
> 
> "The Beatles as musicians : Revolver through the Anthology"
> 
> With the lone "word" that's a ':', reminds me of a dismax stopwords-type 
> problem I ran into. Now, I ran into it on 1.4.  I don't know why it would be 
> different on 1.4 and 3.x. And I see you aren't even using a multi-field 
> dismax in your sample query, so it couldn't possibly be what I ran into... I 
> don't think. But I'll write this anyway in case it gives someone some ideas.
> 
> The problem I ran into is caused by different analysis in two fields both 
> used in a dismax, one that ends up keeping ":" as a token, and one that 
> doesn't.  Which ends up having the same effect as the famous 'dismax 
> stopwords problem'.
> 
> Maybe somehow your schema changed such to produce this problem in 3.x but not 
> in 1.4? Although again I realize the fact that you are only using a single 
> field in your demo dismax query kind of suggests it's not this problem. 
> Wonder if you try the query without the ":", if the problem goes away, that 
> might be a hint. Or, maybe someone more skilled at understanding what's in 
> those Solr debug statements than I am (it's kind of all greek to me) will be 
> able to take this hint and rule out or confirm that it may have something to 
> do with your problem.
> 
> Here I write up the issue I ran into (which may or may not have anything to 
> do with what you ran into)
> 
> http://bibwild.wordpress.com/2011/06/15/more-dismax-gotchas-varying-field-analysis-and-mm/
> 
> 
> Also, you don't say what your 'mm' is in your dismax queries, that could be 
> relevant if it's got anything to do with anything similar to the issue I'm 
> talking about.
> 
> Hmm, I wonder if Solr 3.x changes the way dismax calculates number of tokens 
> for 'mm' in such a way that the 'varying field analysis dismax gotcha' can 
> manifest with only one field, if the way dismax counts tokens for 'mm' 
> differs from number of tokens the single field's analysis produces?
> 
> Jonathan
> 
> On 2/22/2012 2:55 PM, Naomi Dushay wrote:
>> I am working on upgrading Solr from 1.4 to 3.5, and I have hit a problem.   
>> I have a test checking for a search result in Solr, and the test passes in 
>> Solr 1.4, but fails in Solr 3.5.   Dismax is the desired QueryParser -- I 
>> just included output from lucene QueryParser to prove the document exists 
>> and is found
>> 
>> I am completely stumped.
>> 
>> 
>> Here are the debugQuery details:
>> 
>> ***Solr 3.5***
>> 
>> lucene QueryParser:
>> 
>> URL:   q=all_search:"The Beatles as musicians : Revolver through the 
>> Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 6.0562754 = (MATCH) weight(all_search:"the beatl as musician revolv through 
>> the antholog" in 1064395), product of:
>>   1.0 = queryWeight(all_search:"the beatl as musician revolv through the 
>> antholog"), product of:
>> 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
>> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>> 0.02063975 = queryNorm
>>   6.0562754 = fieldWeight(all_search:"the beatl as musician revolv through 
>> the antholog" in 1064395), product of:
>> 1.0 = tf(phraseFreq=1.0)
>> 48.450203 = idf(all_search: the=3531140 beatl=398 as=645923 
>> musician=11805 revolv=872 through=81366 the=3531140 antholog=11611)
>> 0.125 = fieldNorm(field=all_search, doc=1064395)
>> 
>> dismax QueryParser:
>> URL:  qf=all_search&pf=all_search&q="The Beatles as musicians : Revolver 
>> through the Anthology"
>> final query:   +(all_search:"the beatl as musician revolv through the 
>> antholog"~1)~0.01 (all_search:"the beatl as musician revolv through the 
>> antholog"~3)~0.01
>> 
>> (no matches)
>> 
>> 
>> ***Solr 1.4***
>> 
>> lucene QueryParser:
>> 
>> URL:  q=all_search:"The Beatles as musicians : Revolver through the 
>> Anthology"
>> final query:  all_search:"the beatl as musician revolv through the antholog"
>> 
>> 5.2676983 = fieldWeight(all_search:"the beatl as musician revolv through the 
>> antholog" in 3469163), product of:
>>   1.0 = tf(phraseFreq=1.0)
>>   48.16181 = idf(a

Re: Recovering from database connection resets in DataimportHandler

2012-02-22 Thread Erick Erickson
It *just happens* that I wrote a blog on this very topic, see:
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

That code contains two rather different methods, one that indexes
based on a SQL database and one based on indexing random files
with client-side Tika.

Best
Erick

On Wed, Feb 22, 2012 at 8:51 PM, Mike O'Leary  wrote:
> Could you point me to the most non-intimidating introduction to SolrJ that 
> you know of? I have a passing familiarity with Javascript and, with few 
> exceptions, I haven't developing software that has a graphical user interface 
> of any kind in about 25 years. I like the idea of having finer control over 
> data imported from a database though.
> Thanks,
> Mike
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, February 13, 2012 6:19 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Recovering from database connection resets in DataimportHandler
>
> I'd seriously consider using SolrJ and your favorite JDBC driver instead. 
> It's actually quite easy to create one, although as always it may be a bit 
> intimidating to get started. This allows you much finer control over error  
> conditions than DIH does, so may be more suited to your needs.
>
> Best
> Erick
>
> On Sat, Feb 11, 2012 at 2:40 AM, Mike O'Leary  wrote:
>> I am trying to use Solr's DataImportHandler to index a large number of 
>> database records in a SQL Server database that is owned and managed by a 
>> group we are collaborating with. The indexing jobs I have run so far, except 
>> for the initial very small test runs, have failed due to database connection 
>> resets. I have gotten indexing jobs to go further by using 
>> CachedSqlEntityProcessor and specifying responseBuffering=adaptive in the 
>> connection url, but I think in order to index that data I'm going to have to 
>> work out how to catch database connection reset exceptions and resubmit the 
>> queries that failed. Can anyone can suggest a good way to approach this? Or 
>> have any of you encountered this problem and worked out a solution to it 
>> already?
>> Thanks,
>> Mike


Re: result present in Solr 1.4, but missing in Solr 3.5, dismax only

2012-02-22 Thread Robert Muir
On Wed, Feb 22, 2012 at 7:35 PM, Naomi Dushay  wrote:
> Jonathan has brought it to my attention that BOTH of my failing searches 
> happen to have 8 terms, and one of the terms is repeated:
>
>  "The Beatles as musicians : Revolver through the Anthology"
>  "Color-blindness [print/digital]; its dangers and its detection"
>
> but this is a PHRASE search.
>

Can you take your same phrase queries, and simply add some slop to
them (e.g. ~3) and ensure they still match with the lucene
queryparser? SloppyPhraseQuery has a bit of a history with repeats
since Lucene 2.9 that you were using.

https://issues.apache.org/jira/browse/LUCENE-3068
https://issues.apache.org/jira/browse/LUCENE-3215
https://issues.apache.org/jira/browse/LUCENE-3412

-- 
lucidimagination.com


default fq in dismax request handler being overridden

2012-02-22 Thread dboychuck
I have a dismax request handler with a default fq parameter.



explicit
0.01

sku^9.0 upc^9.1 searchKeyword^1.9 series^2.8 productTitle^1.2 productID^9.0
manufacturer^4.0 masterFinish^1.5 theme^1.1 categoryName^2.0 finish^1.4


searchKeyword^2.1 text^0.2 productTitle^1.5 manufacturer^4.0 finish^1.9

isTopSeller:true^1.30
linear(popularity,1,2)^3.0
productID,manufacturer
3<-1 5<-2 6<90%
100
3
discontinued:false



I understand that when I send a search post ex.
 /select?qt=dismax&q=f-0&sort=score%20desc&fq=type_string:"faucet"

What I would like to know is if there is a way to always include the fq that
is defined in the query handler and not have it be overridden but appended
automatically to any solr searches that use the query handler.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/default-fq-in-dismax-request-handler-being-overridden-tp3768735p3768735.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: need to support bi-directional synonyms

2012-02-22 Thread remi tassing
Same question here...

On Wednesday, February 22, 2012, geeky2  wrote:
> hello all,
>
> i need to support the following:
>
> if the user enters "sprayer" in the desc field - then they get results for
> BOTH "sprayer" and "washer".
>
> and in the other direction
>
> if the user enters "washer" in the desc field - then they get results for
> BOTH "washer" and "sprayer".
>
> would i set up my synonym file like this?
>
> assuming expand = true..
>
> sprayer => washer
> washer => sprayer
>
> thank you,
> mark
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/need-to-support-bi-directional-synonyms-tp3767990p3767990.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: default fq in dismax request handler being overridden

2012-02-22 Thread dboychuck
Think I answered my own question... I need to use an appends list

--
View this message in context: 
http://lucene.472066.n3.nabble.com/default-fq-in-dismax-request-handler-being-overridden-tp3768735p3768817.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Development inside or outside of Solr?

2012-02-22 Thread bing
Hi, François Schiettecatte

Thank you for the reply all the same, but  I choose to stick on Solr
(wrapped with Tika language API) and do changes outside Solr. 

Best Regards, 
Bing 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3768903.html
Sent from the Solr - User mailing list archive at Nabble.com.


problem with parsering (using Tika) on remote glassfish

2012-02-22 Thread ola nowak
Hi all!
I'm using Tika parser to index my files into Solr. I created my own parser
(which extends XMLParser). It uses my own mimetype.
I created a jar file which inside looks like this:
src
|-main
|-some_packages
|-MyParser.java
|resources
|-META-INF
|-services
|-org.apache.tika.parser.Parser (which contains
some_packages.MyParser.java)
|_org
|-apache
|-tika
|-mime
|-custom-mimetypes.xml


In custom-mimetypes I put the definition of new mimetype becouse my xml
files have some special tags.

Now where is the problem: I've been testing parsing and indexing with Solr
on glassfish installed on my local machine. It worked just fine. Then I
wanted to install it on some remote server. There is the same version of
glassfish installed (3.1.1). I copied-pasted Solr application, it's home
directory with all libraries (including tika jars and the jar with my
custom parser). Unfortunately it doesn't work. After posting files to Solr
I can see in content-type field that it detected my custom mime type. But
there are no fields that suppose to be there like if MyParser class was
never runned. The only fields I get are the ones from Dublin Core. I
checked (by simply adding some printlines) that Tika is only using
XMLParser.
Have anyone had similar problem? How to handle this?
Regards,
Ola


Re: need to support bi-directional synonyms

2012-02-22 Thread Bernd Fehling

Use

sprayer, washer

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

Regards
Bernd

Am 23.02.2012 07:03, schrieb remi tassing:

Same question here...

On Wednesday, February 22, 2012, geeky2  wrote:

hello all,

i need to support the following:

if the user enters "sprayer" in the desc field - then they get results for
BOTH "sprayer" and "washer".

and in the other direction

if the user enters "washer" in the desc field - then they get results for
BOTH "washer" and "sprayer".

would i set up my synonym file like this?

assuming expand = true..

sprayer =>  washer
washer =>  sprayer

thank you,
mark

--
View this message in context:

http://lucene.472066.n3.nabble.com/need-to-support-bi-directional-synonyms-tp3767990p3767990.html

Sent from the Solr - User mailing list archive at Nabble.com.





Re: SnapPull failed :org.apache.solr.common.SolrException: Error opening new searcher

2012-02-22 Thread eks dev
thanks Mark, I will give it a go and report back...

On Thu, Feb 23, 2012 at 1:31 AM, Mark Miller  wrote:
> Looks like an issue around replication IndexWriter reboot, soft commits and 
> hard commits.
>
> I think I've got a workaround for it:
>
> Index: solr/core/src/java/org/apache/solr/handler/SnapPuller.java
> ===
> --- solr/core/src/java/org/apache/solr/handler/SnapPuller.java  (revision 
> 1292344)
> +++ solr/core/src/java/org/apache/solr/handler/SnapPuller.java  (working copy)
> @@ -499,6 +499,17 @@
>
>       // reboot the writer on the new index and get a new searcher
>       solrCore.getUpdateHandler().newIndexWriter();
> +      Future[] waitSearcher = new Future[1];
> +      solrCore.getSearcher(true, false, waitSearcher, true);
> +      if (waitSearcher[0] != null) {
> +        try {
> +         waitSearcher[0].get();
> +       } catch (InterruptedException e) {
> +         SolrException.log(LOG,e);
> +       } catch (ExecutionException e) {
> +         SolrException.log(LOG,e);
> +       }
> +     }
>       // update our commit point to the right dir
>       solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req, false));
>
> That should allow the searcher that the following commit command prompts to 
> see the *new* IndexWriter.
>
> On Feb 22, 2012, at 10:56 AM, eks dev wrote:
>
>> We started observing strange failures from ReplicationHandler when we
>> commit on master trunk version 4-5 days old.
>> It works sometimes, and sometimes not didn't dig deeper yet.
>>
>> Looks like the real culprit hides behind:
>> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
>>
>> Looks familiar to somebody?
>>
>>
>> 120222 154959 SEVERE SnapPull failed
>> :org.apache.solr.common.SolrException: Error opening new searcher
>>    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1138)
>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1251)
>>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1043)
>>    at org.apache.solr.update.DirectUpdateHandler2.commit(Unknown Source)
>>    at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:503)
>>    at 
>> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:348)
>>    at org.apache.solr.handler.ReplicationHandler.doFetch(Unknown Source)
>>    at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:163)
>>    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>    at 
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>    at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>    at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>    at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>    at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>    at java.lang.Thread.run(Thread.java:722)
>> Caused by: org.apache.lucene.store.AlreadyClosedException: this
>> IndexWriter is closed
>>    at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:810)
>>    at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:815)
>>    at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:3984)
>>    at 
>> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:254)
>>    at 
>> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:233)
>>    at 
>> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:223)
>>    at 
>> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
>>    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1095)
>>    ... 15 more
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: Development inside or outside of Solr?

2012-02-22 Thread bing
Hi, Erick, 

The example is impressive. Thank you. 

For the first, we decide not to do that, as Tika extraction is
time-consuming part in indexing large files, and the dual call make the
situation worse. 

For the second, for now, we choose Dspace to connect to DB, and
discovery(solr) as the index/query. Thus, we might do revisions in dspace. 

Best Regards, 
Bing 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3768977.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Do nested entities have a representation in Solr indexes?

2012-02-22 Thread Mikhail Khludnev
Hello Mike,

Solr is too flat yet. Work is in progress
https://issues.apache.org/jira/browse/SOLR-3076
Good introduction is in Michael's blog
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.htmlbut
it's only about Lucene Queries.
Colleague of my blogged about the same problem but solved it by an
alternative approach http://blog.griddynamics.com/search/label/Solr
Finally we give up with termspositions/spans and considering BJQ as a
solution.

Regards

On Thu, Feb 23, 2012 at 5:37 AM, Mike O'Leary  wrote:

> The data-config.xml file that I have for indexing database contents has
> nested entity nodes within a document node, and each of the entities
> contains field nodes. Lucene indexes consist of documents that contain
> fields. What about entities? If you change the way entities are structured
> in a data-config.xml file, in what way (if any) does it change how the
> contents are stored in the index. When I created the entities I am using,
> and defined the fields in one of the inner entities to be multivalued, I
> thought that the fields of that entity type would be grouped logically
> somehow in the index, but then I remembered that Lucene doesn't have a
> concept of sub-documents (that I know of), so each of the field values will
> be added to a list, and the extent of the logical grouping would be that
> the field values that were indexed together would be at the same position
> in their respective lists. Am I understanding this right, or do entities as
> defined in data-config.xml have some kind of representation in the index
> like document and field do?
> Thanks,
> Mike
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics