Re: update some fields vs replace the whole document

2013-03-10 Thread Upayavira
In terms of the impact upon the index, there is no difference, they do
the same thing - mark the previous doc deleted and insert another. As
jack says, maybe atomic updates are easier for you from an application
perspective.

Note Solr/lucene are heavily optimised towards reading - writing is a
relatively heavy operation.

Upayavira

On Fri, Mar 8, 2013, at 10:41 PM, Mingfeng Yang wrote:
> Then what's the difference between adding a new document vs.
> replacing/overwriting a document?
> 
> Ming-
> 
> 
> On Fri, Mar 8, 2013 at 2:07 PM, Upayavira  wrote:
> 
> > With an atomic update, you need to retrieve the stored fields in order
> > to build up the full document to insert back.
> >
> > In either case, you'll have to locate the previous version and mark it
> > deleted before you can insert the new version.
> >
> > I bet that the amount of time spent retrieving stored fields is matched
> > by the time saved by not having to transmit those fields over the wire,
> > although I'd be very curious to see someone actually test that.
> >
> > Upayavira
> >
> > On Fri, Mar 8, 2013, at 09:51 PM, Mingfeng Yang wrote:
> > > Generally speaking, which has better performance for Solr?
> > > 1. updating some fields or adding new fields into a document.
> > > or
> > > 2. replacing the whole document.
> > >
> > > As I understand,  update fields need to search for the corresponding doc
> > > first, and then replace field values.  While replacing the whole document
> > > is just like adding new document.  Is it right?
> >


How to combine Date range query with negation query

2013-03-10 Thread A Geek

Hi All, I'm trying to run a query against the following fields:  and   

against For majority of the documents the author_location is default i.e. 
"unset" . I want to run a query where the author_location has got some value 
other than "unset" and the created_at field is greater a given timestamp. I 
tried running the following query: -author_location:unset&created_at:[2013-03-10T06:30:21Z TO *]but 
its not working and its dumping results which contains author_location=unset. I 
also tried the following using filter query, but it seems the query is not 
correct or something like that as I'm getting the results that includes 
author_location=unset documents.
Would appreciate if someone could point me to the right query. Please note 
that, I'm running SOLR: solr-spec 4.0.0.2012.10.06.03.04.33on a Linux machine.
Thanks in advance. 
Regards, DK

  

Re: How to combine Date range query with negation query

2013-03-10 Thread Jack Krupansky
Although pure negative queries are supposed to work, there have been bugs in 
various releases, so make the query explicit:


*:* -author_location:unset

Do you actually use an explicit string of "unset" when adding documents?

-- Jack Krupansky

-Original Message- 
From: A Geek

Sent: Sunday, March 10, 2013 7:06 AM
To: solr user
Subject: How to combine Date range query with negation query


Hi All, I'm trying to run a query against the following fields: 
default="unset" />and   stored="true"/>


against For majority of the documents the author_location is default i.e. 
"unset" . I want to run a query where the author_location has got some value 
other than "unset" and the created_at field is greater a given timestamp. I 
tried running the following query: name="q">-author_location:unset&created_at:[2013-03-10T06:30:21Z TO 
*]but its not working and its dumping results which contains 
author_location=unset. I also tried the following using filter query, but it 
seems the query is not correct or something like that as I'm getting the 
results that includes author_location=unset documents.
Would appreciate if someone could point me to the right query. Please note 
that, I'm running SOLR: solr-spec 4.0.0.2012.10.06.03.04.33on a Linux 
machine.

Thanks in advance.
Regards, DK





Re: Solr 4.x auto-increment/sequence/counter functionality.

2013-03-10 Thread mark12345
A slightly different approach.

* I noticed that I can sort by the internal Lucene _docid_.

->   http://wiki.apache.org/solr/CommonQueryParameters
  

> You can sort by index id using sort=_docid_ asc or sort=_docid_ desc

* I have also read the docid is represented by a sequential number.

->  
http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html

  

>  Your document IDs may change, and in fact *will* change if you delete a
> document and then optimize. Say you index 100 docs, delete number 50 and
> optimize. Documents that originally had IDs 51-100 will now have IDs 50-99
> and your hierarchy will be messed up. 


So there is a slight chance that the _docid_ might represent document
creation order.  Does anyone have knowledge and experience with the
internals of the Lucene _docid_ field?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4046137.html
Sent from the Solr - User mailing list archive at Nabble.com.


[ANN] vifun: a GUI to help visually tweak Solr scoring, release 0.6

2013-03-10 Thread xavier jmlucjav
Hi,

I am releasing an new version (0.6) of vifun, a GUI to help visually tweak
Solr scoring. Most relevant changes are:
- support float values
- add support for tie
- synch both Current/Baseline scrollbars (if some checkbox is selected)
- doubleclick in a doc: show side by side comparison of debug score info
- upgrade to griffon1.2.0
- allow using another handler (besides /select) enhancement

You can check it out here: https://github.com/jmlucjav/vifun
Binary distribution:
http://code.google.com/p/vifun/downloads/detail?name=vifun-0.6.zip

xavier


Re: How to add shard in 4.2-snapshot

2013-03-10 Thread Jam Luo
I build indexes by  EmbeddedSolrServer, then move them to the online
system. The online system do not add new Documents. So the hashcode range
is not important, I need add shard only. how do I customise it?

thanks


2013/3/10 adfel70 

> Mark, what's the current estimation for official 4.2 release?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-add-shard-in-4-2-snapshot-tp4045716p4046099.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to add shard in 4.2-snapshot

2013-03-10 Thread Mark Miller
A vote is happening now - it will likely be early this week.

- Mark

On Mar 10, 2013, at 7:17 AM, adfel70  wrote:

> Mark, what's the current estimation for official 4.2 release?
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-add-shard-in-4-2-snapshot-tp4045716p4046099.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Mark document as hidden

2013-03-10 Thread Erik Hatcher
Seems like that technique would work, as long as the file is saved and flushed 
before the actual commit occurs.

Erik

On Mar 8, 2013, at 12:17 , lboutros wrote:

> I could create an UpdateRequestProcessorFactory that could update this file,
> it seems to be better ?
> 
> 
> 
> -
> Jouve
> France.
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Mark-document-as-hidden-tp4045756p4045842.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Feeding Custom QueryParser with Nested Query

2013-03-10 Thread jimtronic
It seems like I could could accomplish this by following the
JoinQParserPlugin logic. I can actually get pretty close using the join
query, but I need to do some extra math in the middle.

The difference in my case is that I need to access the id and the score. I
*think* the logic would go something like this:

1. do sub query to get doc ids and score
2. use the resulting doc ids to feed into another query.
3. write a custom scorer that uses the score from the subquery to determine
the scores of the final results.

Thanks for any suggestions...

Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Feeding-Custom-QueryParser-with-Nested-Query-tp4046007p4046162.html
Sent from the Solr - User mailing list archive at Nabble.com.


optimal maxWarmingSearchers in solr cloud

2013-03-10 Thread jimtronic
The notes for maxWarmingSearchers in solrconfig.xml state:

"Recommend values of 1-2 for read-only slaves, higher for masters w/o cache
warming."

Since solr cloud nodes could be both a leader and non-leader depending on
the current state of the cloud, what would be the optimal setting here?

Thanks!
Jim



--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimal-maxWarmingSearchers-in-solr-cloud-tp4046164.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimal maxWarmingSearchers in solr cloud

2013-03-10 Thread Timothy Potter
Sorry I'm answering your question with more questions, but this is an area
of special interest for me too.

It's tough to say an exact value as it depends on how frequently you do
full "hard" commits and how much warming you do when you open a new
searcher. Broadly speaking, you want it to be a smallish value as
background warming can be expensive. So how often are you doing a "hard"
commit and does it take longer to warm-up your searchers?

A few things to consider are:

auto-commits - can use openSearcher=false to avoid opening a new searcher
when doing large batch updates. This allows you to auto-commit more
frequently so that your update log doesn't get too big w/o paying the price
of warming a new searcher on every auto-commit.

new searcher warming queries - how many of these do you have and how long
do they take to warm up? You can get searcher warm-up time from the
Searcher MBean in the admin console.

cache auto-warming - again, how much of your existing caches are you
auto-warming? Keep a close eye on the filterCache and it's autowarmCount.
Warm-up times also available from the admin console.

Cheers,
Tim


On Sun, Mar 10, 2013 at 12:54 PM, jimtronic  wrote:

> The notes for maxWarmingSearchers in solrconfig.xml state:
>
> "Recommend values of 1-2 for read-only slaves, higher for masters w/o cache
> warming."
>
> Since solr cloud nodes could be both a leader and non-leader depending on
> the current state of the cloud, what would be the optimal setting here?
>
> Thanks!
> Jim
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/optimal-maxWarmingSearchers-in-solr-cloud-tp4046164.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Memory Guidance

2013-03-10 Thread Timothy Potter
Hi Jim,

I'd venture to guess your Solr core is using MMapDirectory and if so, then
the physical memory value is correct and nothing to worry about. The index
is loaded into virtual memory using mem mapped I/O.

The file descriptor count looks fine too, but when using MMapDirectory make
sure your OS reports unlimited for ulimit  -v and ulimit -m (not too sure
what -m is but -v is virtual memory).

You'll probably want to give a bit more RAM if you can spare it to your JVM
(but not too much more) esp. if you do lots of custom sorting.

Good read if you haven't seen it:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Tim

On Sun, Mar 10, 2013 at 8:00 PM, jimtronic  wrote:

> I'm having trouble finding some problems while load testing my setup.
>
> If you saw these numbers on your dashboard, would they worry you?
>
> Physical Memory  97.6%
> 14.64 GB of 15.01 GB
>
> File Descriptor Count  19.1%
> 196 of 1024
>
> JVM-Memory  95%
> 1.67 GB (dark gray)
> 1.76 GB (med gray)
> 1.76 GB
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Memory-Guidance-tp4046207.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Custom update handler?

2013-03-10 Thread Jack Park
With 4.1, not in cloud configuration, I have a custom response handler
chain which injects an additional handler for studying the documents
as they come in. But, when I do partial updates on those documents, I
don't want them to be studied again, so I created another version of
the same chain, but without my added feature. I named it "/partial".

When I create an instance of SolrJ for the url /solr/partial,
I get back this error message:

Server at http://localhost:8983/solr/partial returned non ok
status:404, message:Not Found
{locator=2146fd50-fac9-47d5-85c0-47aaeafe177f,
tuples={set=99edfffe-b65c-4b5e-9436-67085ce49c9c}}

Here is the configuration for that:


  
  


The normal handler chain is this:


  
  
hello
  
  


which runs on a SolrJ set for  http://localhost:8983/solr/

What might I be missing?

Many thanks
Jack