UnionDocsAndPositionsEnum class not found

2013-02-06 Thread Markus Jelsma
Hi,

We're getting the following trace for some Dismax queries that contain 
non-alphanumerics:

Feb 6, 2013 10:06:56 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
org/apache/lucene/search/UnionDocsAndPositionsEnum
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:483)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:290)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NoClassDefFoundError: 
org/apache/lucene/search/UnionDocsAndPositionsEnum
at 
org.apache.lucene.search.MultiPhraseQuery.createWeight(MultiPhraseQuery.java:302)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:187)
at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:401)
at 
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:657)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:290)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1586)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1306)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:401)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:418)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:216)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:469)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
... 25 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.lucene.search.UnionDocsAndPositionsEnum
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at 
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:430)
at 
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:383)
... 39 more

The trace is confusing, any ideas? File bug? My build is clean and updated to 
yesterday's trunk.

Thanks,
Markus


Re: auto trigger the delta import to update index in solr if any update in sql database

2013-02-06 Thread Rohan Thakur
hi

thanks but I think this one is for ms sql not for mysql

regards
Rohan

On Wed, Feb 6, 2013 at 11:53 AM, jp  wrote:

> The following link provides on using external activator for tracking DB
> changes
> http://ajitananthram.wordpress.com/2012/05/26/auditing-external-activator/
>
> --JP
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/auto-trigger-the-delta-import-to-update-index-in-solr-if-any-update-in-sql-database-tp4038525p4038715.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Fwd: advice about develop AbstractSolrEventListener.

2013-02-06 Thread Miguel

Hi

I found a solution. I am going to Configured Update Request 
Processors, that I have seen in: 
http://wiki.apache.org/solr/UpdateRequestProcessor


If I developed a custom class extend UpdateRequestProcessorFactory,  
I'll have access to :


 * SolrQueryRequest req. (Object request)
 * SolrQueryResponse rsp. (Object response)
 * UpdateRequestProcessor next. (Object Processor, contain stream with
   updated data)

Configuring solrconfig.xml I can include my custom processor on Update 
handler (solr.XmlUpdateRequestHandler), selecting my UpdateChain. This 
way seems independent of handler Solr is using for update process, and 
allow include events associated to updated records after commit event.


Thanks


El 04/02/2013 9:03, Miguel escribió:

Hi everybody

  Please, I need to know if anybody has done similar something.
I have to developed a notification when commit event hapened on Solr 
server, but I have to know updated records for creating correctly the 
notification. Develop java class extends of AbstractSolrEventListener, 
I don't see how to get updated data associated to commit event.


Thanks for help

El 31/01/2013 9:32, Miguel escribió:

Hi

  After to study apache solr documentation, I think only way to know 
update records (modify, delete an insert actions) is developed a 
class extends org.apache.solr.servlet.SolrUpdateServlet.
In this class, I can access updated record information go into Apache 
solr server.


Somebody can confirm me, that this way is the best way? or is there 
any options?


thanks

El 30/01/2013 13:39, Miguel escribió:


Hi

I have to developed a function that must comunicate with webservice and
this function must execute after each time commits.
My doubt;
it's possible get that records had been updated on solr index?
My function must send information about add, updated and delete records
from solr index to external webservice, and this information must be
send after commit event.

I have read wiki apache solr and it seems the best way is create
listener with event=postCommit, but I have seen example
"solr.RunExecutableListener" and I don't see how to know records
associated to commit event.

Example Solrconfig.xml:


 


Thanks.











Re: Dynamic fields - names having special characters like ">" "<"

2013-02-06 Thread Rajani Maski
Hi all,


  I found a solution myself -  to replace -  < >   with < and >


Thanks & Regards
Rajani



On Wed, Feb 6, 2013 at 3:50 PM, Rajani Maski  wrote:

> Hi all,
>
>   We have few *dynamic field names*  coming with special character. ex: *
> _str*. Solr throws error:* org.apache.solr.common.SolrException:
> Unexpected '<'*
> I followed this link
> for
> escaping  - *_str *didn't work.
>
>
> Is there anyway to handle such cases?
>
>
>
> Awaiting for reply
>
>
> Thanks & Regards
> Rajani
>


memory leak - multiple cores

2013-02-06 Thread Marcos Mendez
Hi,

I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the
following issue and it eats up a lot of memory when shutting down. Has
anyone seen this and have an idea how to solve it?

Exception in thread "DefaultThreadPool 196" java.lang.OutOfMemoryError:
PermGen space
2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
LEAK!!!
2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
LEAK!!!
2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 instance=2080324477

Regards,
Marcos


Re: Controlling traffic between solr 4.1 nodes

2013-02-06 Thread Erick Erickson
You're not missing anything that I know of. The best I've been able to come
up with so far is to treat the disparate DCs as separate clusters. Your
ingestion process needs to know enough to send updates to both DCs but
that's the only point of contact.

The problem I see here is not only with inter-DC communications, but ZK.
Let's say ZK1 is in DC1 and ZK2 and ZK3 are in DC2. Now anytime the
connection is lost, DC1 is down since it can't sense a  ZK quorum. I know
of one person who put the ZK nodes in three separate DCs to help with that
problem.

But the bottom line is that SolrCloud chatters amongst nodes and you have
no good ways to control it. Either you have to accept the latency between
DCs or use separate clusters as far as I know. I do know there's some JIRAs
about making SolrCloud "rack aware" which may address this, but I don't
think they're in place yet.

Best
Erick


On Tue, Feb 5, 2013 at 10:17 AM, Michael Tracey  wrote:

> Hey all, new to Solr 4.x, and am wondering if there is any way that I
> could have a single collection (single or multiple shards) replicated into
> two datacenters, where only 1 solr instance in each datacenter communicate.
>  (for example, 4 servers in one DC, 4 servers in another datacenter and
> only one in each DC communicate).
>
> From everything I've seen, all zookeepers and replicas must have access to
> all other members.  Is there something I'm missing?
>
> Thanks,
>
> M.
>


Re: Getting Lucense Query from Solr query (Or converting Solr Query to Lucense's query)

2013-02-06 Thread Sabeer Hussain
It is working but I come across another problem. I am expecting same
parsedquery in the following two approaches but I am not getting it as same

Approach 1:
BooleanQuery myQuery = new BooleanQuery();
myQuery.add(new TermQuery(new Term("PATIENT_GENDER", "Male")),
BooleanClause.Occur.SHOULD);
myQuery.add(new TermQuery(new Term("STUDY_DIVISION","\"Cancer 
Center\"")),
BooleanClause.Occur.SHOULD);
System.out.println("parsedquery >>"+myQuery.toString());

output is 
parsedquery >>PATIENT_GENDER:Male STUDY_DIVISION:"Cancer Center" 

Approach 2: 
String defaultField ="text";
String queryString = "PATIENT_GENDER:Male OR STUDY_DIVISION:\"Cancer
Center\"";
QueryParser qp = new QueryParser(Version.LUCENE_40, defaultField, new
StandardAnalyzer(Version.LUCENE_40));

System.out.println("querystring >>"+queryString);
Query q = qp.parse(queryString);
System.out.println("parsedquery >>"+q.toString());

output is
querystring >>PATIENT_GENDER:Male OR STUDY_DIVISION:"Cancer Center"
parsedquery >>PATIENT_GENDER:male STUDY_DIVISION:"cancer center"

In the Approach 2, the values are converted into lower case and due to that
I am not getting same results even though my querystring are same.

If I have some fq parameters (filter query), how I can parse the query for
lucene?  Is there any place from where I can get more information about
these kind of programming? any book available? 

-- Sabeer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-Lucense-Query-from-Solr-query-Or-converting-Solr-Query-to-Lucense-s-query-tp4031187p4038753.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-threaded post.jar?

2013-02-06 Thread Jan Høydahl
With dependencies I meant external jar dependencies. Perhaps extensions could 
have deps while leaving the "core" compilable without?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

5. feb. 2013 kl. 17:10 skrev Upayavira :

> By dependencies, do you mean other java classes? I was thinking of
> splitting it out into a few classes, each of which is clearer in its
> purpose.
> 
> Upayavira
> 
> On Tue, Feb 5, 2013, at 02:26 PM, Jan Høydahl wrote:
>> Wiki page exists already: http://wiki.apache.org/solr/post.jar
>> 
>> I'm happy to consider a refactoring, especially if it make it SIMPLER to
>> read and interact with and doesn't add a ton of mandatory dependencies.
>> It should probably still be possible to say something like
>> 
>>  javac org/apache/solr/util/SimplePostTool.java
>>  java -cp . org.apache.solr.util.SimplePostTool -h
>> 
>> That's just how I've been thinking so far though. If other committers are
>> happy with abandoning the simple-ness and instead create a best-practices
>> based feature-rich tool with dependencies, then I'll not object.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> 5. feb. 2013 kl. 05:22 skrev Upayavira :
>> 
>>> Thx Jan,
>>> 
>>> All I know is I've got a data set of 500k documents, Solr formatted, and
>>> I want it to be as easy as possible to get them into Solr. I also want
>>> to be able to show the benefit of multithreading. The outcome would
>>> really be "make sure your code uses multiple threads to push to Solr"
>>> rather than "use post.jar in production". I see post.jar as a
>>> demonstration tool, rather than anything else, and am considering adding
>>> another feature to enhance that.
>>> 
>>> However, I did stall once I started looking at the SimplePostTool.jar
>>> class, because it is loosing its connection with the term 'Simple'.
>>> Adding multithreading, however useful, correct, whatever, would
>>> completely push it over the edge. Thus, I think the proper approach is
>>> to refactor the tool into a number of classes, and only then think about
>>> adding multithreading as a completely separate affair. I'm more than
>>> happy to have a go at that refactoring, especially if you're prepared to
>>> review it.
>>> 
>>> I guess the other thing that is much needed is a wiki page that details
>>> the features of the tool, and also explains that its role is
>>> educational, rather than anything else.
>>> 
>>> Upayavira
>>> 
>>> On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote:
 Hi,
 
 Hmm, the tool is getting bloated for a one-class no-deps tool already :)
 Guess it would be useful too with real-life code examples using SolrJ and
 other libs as well (such as robots.txt lib, commons-cli etc), but whether
 that should be an extension of SimplePostTool or a totally new tool from
 scratch is something to discuss. Please bring on your ideas of how you
 plan to extend it, perhaps even simplifying the code in the process?
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 3. feb. 2013 kl. 17:19 skrev Upayavira :
 
> I have a scenario in which I need to post 500,000 documents to Solr as a
> test. I have these documents in XML files already formatted in Solr's
> xml format.
> 
> Posting to Solr using post.jar it takes 1m55s. With a bit of bash
> jiggery-pokery, I was able to get this down to 1m08s by running four
> concurrent post.jar instances, which strikes me as a significant
> improvement.
> 
> I'm considering adding multithreaded capabilities to post.jar, but
> before I go to that effort, I wanted to see if anyone else would
> consider it a useful feature. Given that the SimplePostTool is becoming
> far from simple, I wanted to see whether the feature is likely to be
> accepted before I put in the effort. Also, I would need to consider
> which parts of the tool to add that to. Currently I only want it for
> posting XML docs, but there's also crawling capabilities in it too.
> 
> Thoughts?
> 
> Upayavira
 
>> 



Re: distinct count of facet field values

2013-02-06 Thread Joey Dale

Try q=*:*&group=true&group.field=cat&group.ngroups=true

On 2/4/13 3:53 AM, J Mohamed Zahoor wrote:

Hi

Is it possible to get the distinct  count of a given facet field in Solr?

A query like this  q=*:*&facet=true&facet.field=cat display the counts of all 
the unique categories present like

electronics: 100
applicances:200  etc..

But if the list is big.. i dont want to get the entire list and take a count by 
looping...
Instead if i get a count of the no of items in the list.. i am okay..

SOLR-2242 was doing just that...
But it is not giving a distinct count if i have multiple shards...

Is there any other way to get this?

./Zahoor






Re: Upgrading indexes from Solr 1.4.1 to 4.1.0

2013-02-06 Thread Artem OXSEED
It turns out that all our fields are stored and restoring from the 
source data is a bit of problem. I've tried DIH/SorEntityProcessor and 
it seems to be working out good, so I'll probably end up using it.

Thank you!

--
Warm regards,
Artem Karpenko

On 04.02.2013 19:58, Lance Norskog wrote:

A side problem here is text analyzers: the analyzers have changed how
they split apart text for searching, and are matched pairs. That is, the
analyzer queries are created matching what the analyzer did when
indexing. If you do this binary upgrade sequence, the indexed data will
not match what the analyzers do. It is not a major problem, but queries
will not bring back what you expect.

Also, in 4.x, the unique field has to be called 'id' and every document
needs a '_version_' field.

On 02/04/2013 09:32 AM, Upayavira wrote:

Just to add a little to the good stuff Shawn has shared here - Solr 4.1
does not support 1.4.1 indexes. If you cannot re-index (by far
recommended), then first upgrade to 3.6, then optimize your index, which
will convert it to 3.6 format. Then you will be able to use that index
in 4.1. The simple logic here is that Solr/Lucene can read the indexes
of the previous major version. Given you are two major versions behind,
you'd have to do it in two steps.

Upayavira

On Mon, Feb 4, 2013, at 03:18 PM, Shawn Heisey wrote:

On 2/4/2013 7:20 AM, Artem OXSEED wrote:

I need to upgrade our Solr installation from 1.4.1 to the latest 4.1.0
version. The question is how to deal with indexes. AFAIU there are two
things to be aware of: file format and index format (excuse me for
possible term mismatch, I'm new to Solr) - and while file format can
(and will automatically?) be updated if old index files are used by new
Solr installation, one cannot say the same about index format. Is it true?

And if the above is true then the question is - should this "index
format" be updated at all - i.e. if we can happily live with it then
it's fine, but I guess that this decision will not bring
performance/feature improvements that were introduced since 1.4.1
version, will it?

Assuming we do need to update this "index format", how to do it? I found
solution on SO
(http://stackoverflow.com/questions/4528063/moving-data-from-solr-1-5-to-solr-4-0)
that includes usage of some "export to XML" feature, maybe with Luke,
some custom-made XSLT transformation and import back. Seems like a lot
to do - although it's quite understandable. However, this answer was
given in 2010 with Solr 4.0 being in pre-alpha - so maybe there are now
tools for this now?

Artem,

When upgrading Solr, the absolute best option is always to delete (or
move) your index directory, let the new version recreate it, and rebuild
from scratch by reindexing from your original data source.  This should
always remain an option - the indexes may get corrupted by an unexpected
situation.  If you have the ability to rebuild your 1.4.1 index from
your original data source, then it should be straightforward to do the
same thing on the new version.

Solr 4.1 can read version 3.x indexes, but I would not be surprised to
find that it can't read the Lucene 2.9.x format that Solr 1.4.1 uses.  I
don't know how much difference there is between the 2.9.x format and the
3.x format.  I'm not aware of a distinction between "file" and "index"
formats.

If a Solr version supports an older format, then it will read the
segments created in that format, but new segments will be in the new
format.  Solr/Lucene index segments on disk are never changed once they
are finalized.  They can be merged into new segments and then deleted,
but nothing will ever change them.

Have you stored every single field individually in Solr?  If you have,
then you will be able to retrieve the data to reindex into the new
version.  If you have fields that are indexed but not stored, then even
with the XML method you will be unable to obtain all the data.  It is
fairly normal in a Solr schema to have fields that you can search on but
that are not stored, because stored fields make the index larger.

If you have stored every single field in your index, you can also use
the SolrEntityProcessor in the dataimport handler to import from an old
Solr server to a new one.

The critical piece of the puzzle for upgrading between incompatible
versions is that you must be storing every field in the old version
before you start.  If you aren't storing a particular field, then the
data from that field is not retrievable and you must go back to the
original data source.

http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

Thanks,
Shawn





Re: memory leak - multiple cores

2013-02-06 Thread Michael Della Bitta
Marcos,

The later 3 errors are common and won't pose a problem unless you
intend to reload the Solr application without restarting Geronimo
often.

The first error, however, shouldn't happen. Have you changed the size
of PermGen at all? I noticed this error while testing Solr 4.0 in
Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0,
you might want to try upgrading.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez  wrote:
> Hi,
>
> I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the
> following issue and it eats up a lot of memory when shutting down. Has
> anyone seen this and have an idea how to solve it?
>
> Exception in thread "DefaultThreadPool 196" java.lang.OutOfMemoryError:
> PermGen space
> 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
> not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
> LEAK!!!
> 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
> not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
> LEAK!!!
> 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
> shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
>  instance=2080324477
>
> Regards,
> Marcos


RE: Correct way for getting SolrCore?

2013-02-06 Thread Ryan Josal
This is perfect, thanks!  I'm surprised it eluded me for so long.

From: Mark Miller [markrmil...@gmail.com]
Sent: Tuesday, February 05, 2013 4:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Correct way for getting SolrCore?

The SolrCoreAware interface?

- Mark

On Feb 5, 2013, at 5:42 PM, Ryan Josal  wrote:

> By way of the deprecated SolrCore.getSolrCore method,
>
> SolrCore.getSolrCore().getCoreDescriptor().getCoreContainer().getCores()
>
> Solr starts up in an infinite recursive loop of loading cores.  I understand 
> now that the UpdateProcessorFactory is initialized as part of the core 
> initialization, so I expect there is no way to read the index of a core if 
> the core has not been initialized yet.  I still feel a bit uneasy about 
> initialization on the first update request, so is there some other place I 
> can plugin initialization code that runs after the core is loaded?  I suppose 
> I'd be using SolrCore.getSearcher().get().getIndexReader() to get the 
> IndexReader, but if that happens after a good point of plugging in this 
> initialization, then I guess SolrCore.getIndexReaderFactory() is the way to 
> go.
>
> Thanks,
> Ryan
> 
> From: Ryan Josal [rjo...@rim.com]
> Sent: Tuesday, February 05, 2013 1:27 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Correct way for getting SolrCore?
>
> Is there any way I can get the cores and do my initialization in the 
> @Override public void init(final NamedList args) method?  I could wait for 
> the first request, but I imagine I'd have to deal with indexing requests 
> piling up while I iterate over every document in every index.
>
> Ryan
> 
> From: Mark Miller [markrmil...@gmail.com]
> Sent: Tuesday, February 05, 2013 1:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Correct way for getting SolrCore?
>
> The request should give you access to the core - the core to the core 
> descriptor, the descriptor to the core container, which knows about all the 
> cores.
>
> - Mark
>
> On Feb 5, 2013, at 4:09 PM, Ryan Josal  wrote:
>
>> Hey guys,
>>
>> I am writing an UpdateRequestProcessorFactory plugin which needs to have 
>> some initialization code in the init method.  I need to build some 
>> information about each SolrCore in memory so that when an update comes in 
>> for a particular SolrCore, I can use the data for the appropriate core.  
>> Ultimately, I need a lucene IndexReader for each core.  I figure I'd get 
>> this through a SolrCore, CoreContainer, or CoreDescriptor.  I've looked 
>> around for awhile and I always end up going in circles.  So how can I 
>> iterate over cores that have been loaded?
>>
>> Ryan
>> -
>> This transmission (including any attachments) may contain confidential 
>> information, privileged material (including material protected by the 
>> solicitor-client or other applicable privileges), or constitute non-public 
>> information. Any use of this information by anyone other than the intended 
>> recipient is prohibited. If you have received this transmission in error, 
>> please immediately reply to the sender and delete this information from your 
>> system. Use, dissemination, distribution, or reproduction of this 
>> transmission by unintended recipients is not authorized and may be unlawful.
>
>
> -
> This transmission (including any attachments) may contain confidential 
> information, privileged material (including material protected by the 
> solicitor-client or other applicable privileges), or constitute non-public 
> information. Any use of this information by anyone other than the intended 
> recipient is prohibited. If you have received this transmission in error, 
> please immediately reply to the sender and delete this information from your 
> system. Use, dissemination, distribution, or reproduction of this 
> transmission by unintended recipients is not authorized and may be unlawful.
>
> -
> This transmission (including any attachments) may contain confidential 
> information, privileged material (including material protected by the 
> solicitor-client or other applicable privileges), or constitute non-public 
> information. Any use of this information by anyone other than the intended 
> recipient is prohibited. If you have received this transmission in error, 
> please immediately reply to the sender and delete this information from your 
> system. Use, dissemination, distribution, or reproduction of this 
> transmission by unintended recipients is not authorized and may be unlawful.


-
This transmission (including any attachments) may contain confidential 
information, privileged material (including

Advanced Search Option in Solr corresponding to DtSearch options

2013-02-06 Thread Soumyanayan Kar
Hi,

 

We are replacing the search and indexing module in an application from
DtSearch to Solr using solrnet as the .net Solr client library.

 

We are relatively new to Solr/Lucene and would need some help/direction to
understand the more advanced search options in Solr.

 

The current application supports the following search options using
DtSearch:

 

1)Word(s) or phrase

2)Exact words or phrases

3)Not these words or phrases

4)One or more of words("A" OR "B" OR "C")

5)Proximity of word with n words of another word

6)Numeric range - From - To

7)Option

. Stemming(search* finds searching or searches)

. Synonym(search& finds seek or look)

. Fuzzy within n letters(p%arts finds paris)

. Phonic homonyms(#Smith also finds Smithe and Smythe)

 

As an example the search query that gets generated to be posted to DtSearch
for the below use case:

1.   Search Phrase:  generic collection

2.   Exact Phrase: linq

3.   Not these words: sql

4.   One or more of these words:  ICollection or ArrayList or
Hashtable

5.   Proximity:   csharp within
4 words of language

6.   Options:

a.  Stemming

b.  Synonym

c.   Fuzzy within 2 letters

d.  Phonic homonyms

 

Search Query: generic* collection* generic& collection& #generic #collection
g%%eneric c%%ollection "linq"  -sql ICollection OR ArrayList OR Hashtable
csharp w/4 language

 

We have been able to do simple searches(singular term search in a file
content) with highlights with Solr. Now we need to replace these options
with Solr/Lucene.

 

Can anybody provide some directions on what/where should we be looking.

 

Thanks & Regards,

 

Soumya.

 

 



Long running query triggers full copy in Solr 4.1

2013-02-06 Thread Niran Fajemisin
Hi all,

I have noticed the following occur with some consistency: When I execute a long 
running query (that spans 15 or more seconds), the Solr node that is servicing 
the request starts to perform a full copy from the shard leader. My current 
configuration has only one shard with 3 replicas. Note that there are no 
updates happening on any of the Solr nodes in the cluster; hence there really 
shouldn't be any changes to the underlying index, nor is there any need to 
synchronize the index with other replicas. 

I'm just trying to understand what kind of events can typically trigger syncing 
the index with other replicas...specifically resulting in a full copy, when no 
updates have been made. Could this have to do with some timeout value settings 
for Zookeeper...where the Solr server is unable to respond to a heartbeat 
request to report it's state? 

Any pointers would be greatly appreciated. Thanks!

-Niran  

copyField vs single field

2013-02-06 Thread adm1n
Hi,

Let's assume I have to search for a string (textField) in 6-7 different
fields (username, firstname, lastname, etc). Which one will have better
performance:
username:test OR firstname:test OR lastname:test
or defining some copyField and searching within it like somecopyfield:test


thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/copyField-vs-single-field-tp4038832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Updating data

2013-02-06 Thread anurag.jain
Hi, Thanks for reply.

But i was doing same thing but whenver i tried to update previous field
automatically delete. :(

I am not getting why ? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-data-tp4038492p4038833.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-threaded post.jar?

2013-02-06 Thread Otis Gospodnetic
Btw wouldn't this be a chance to create a solr cli tool, much like
es2unix?  Maybe with a shell? I'm off-line now, but I recently came across
a java lib that makes this easy... jclam jsomething ...

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Feb 6, 2013 8:48 AM, "Jan Høydahl"  wrote:

> With dependencies I meant external jar dependencies. Perhaps extensions
> could have deps while leaving the "core" compilable without?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 5. feb. 2013 kl. 17:10 skrev Upayavira :
>
> > By dependencies, do you mean other java classes? I was thinking of
> > splitting it out into a few classes, each of which is clearer in its
> > purpose.
> >
> > Upayavira
> >
> > On Tue, Feb 5, 2013, at 02:26 PM, Jan Høydahl wrote:
> >> Wiki page exists already: http://wiki.apache.org/solr/post.jar
> >>
> >> I'm happy to consider a refactoring, especially if it make it SIMPLER to
> >> read and interact with and doesn't add a ton of mandatory dependencies.
> >> It should probably still be possible to say something like
> >>
> >>  javac org/apache/solr/util/SimplePostTool.java
> >>  java -cp . org.apache.solr.util.SimplePostTool -h
> >>
> >> That's just how I've been thinking so far though. If other committers
> are
> >> happy with abandoning the simple-ness and instead create a
> best-practices
> >> based feature-rich tool with dependencies, then I'll not object.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >> Solr Training - www.solrtraining.com
> >>
> >> 5. feb. 2013 kl. 05:22 skrev Upayavira :
> >>
> >>> Thx Jan,
> >>>
> >>> All I know is I've got a data set of 500k documents, Solr formatted,
> and
> >>> I want it to be as easy as possible to get them into Solr. I also want
> >>> to be able to show the benefit of multithreading. The outcome would
> >>> really be "make sure your code uses multiple threads to push to Solr"
> >>> rather than "use post.jar in production". I see post.jar as a
> >>> demonstration tool, rather than anything else, and am considering
> adding
> >>> another feature to enhance that.
> >>>
> >>> However, I did stall once I started looking at the SimplePostTool.jar
> >>> class, because it is loosing its connection with the term 'Simple'.
> >>> Adding multithreading, however useful, correct, whatever, would
> >>> completely push it over the edge. Thus, I think the proper approach is
> >>> to refactor the tool into a number of classes, and only then think
> about
> >>> adding multithreading as a completely separate affair. I'm more than
> >>> happy to have a go at that refactoring, especially if you're prepared
> to
> >>> review it.
> >>>
> >>> I guess the other thing that is much needed is a wiki page that details
> >>> the features of the tool, and also explains that its role is
> >>> educational, rather than anything else.
> >>>
> >>> Upayavira
> >>>
> >>> On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote:
>  Hi,
> 
>  Hmm, the tool is getting bloated for a one-class no-deps tool already
> :)
>  Guess it would be useful too with real-life code examples using SolrJ
> and
>  other libs as well (such as robots.txt lib, commons-cli etc), but
> whether
>  that should be an extension of SimplePostTool or a totally new tool
> from
>  scratch is something to discuss. Please bring on your ideas of how you
>  plan to extend it, perhaps even simplifying the code in the process?
> 
>  --
>  Jan Høydahl, search solution architect
>  Cominvent AS - www.cominvent.com
>  Solr Training - www.solrtraining.com
> 
>  3. feb. 2013 kl. 17:19 skrev Upayavira :
> 
> > I have a scenario in which I need to post 500,000 documents to Solr
> as a
> > test. I have these documents in XML files already formatted in Solr's
> > xml format.
> >
> > Posting to Solr using post.jar it takes 1m55s. With a bit of bash
> > jiggery-pokery, I was able to get this down to 1m08s by running four
> > concurrent post.jar instances, which strikes me as a significant
> > improvement.
> >
> > I'm considering adding multithreaded capabilities to post.jar, but
> > before I go to that effort, I wanted to see if anyone else would
> > consider it a useful feature. Given that the SimplePostTool is
> becoming
> > far from simple, I wanted to see whether the feature is likely to be
> > accepted before I put in the effort. Also, I would need to consider
> > which parts of the tool to add that to. Currently I only want it for
> > posting XML docs, but there's also crawling capabilities in it too.
> >
> > Thoughts?
> >
> > Upayavira
> 
> >>
>
>


OR OR OR

2013-02-06 Thread anurag.jain
in my query there are many OR's now after 79 or 80 ORS it gives error that
url is very large. 


http://xvz/solr/select?q=*:*&fq=institute_name:"xyz"; OR
institute_name:"sfsda" OR institute_name:"sdfsaf" .. 


i found a solution that we can give query through POST. but i don't know how
? can you please tell me how to do this. please reply. :( urgent 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/OR-OR-OR-tp4038836.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: OR OR OR

2013-02-06 Thread Shawn Heisey

On 2/6/2013 12:41 PM, anurag.jain wrote:

in my query there are many OR's now after 79 or 80 ORS it gives error that
url is very large.


http://xvz/solr/select?q=*:*&fq=institute_name:"xyz"; OR
institute_name:"sfsda" OR institute_name:"sdfsaf" ..


i found a solution that we can give query through POST. but i don't know how
? can you please tell me how to do this. please reply. :( urgent


If you used this query format instead, your URL would be smaller.  Note 
that the quotes are only required if you want to do an explicit phrase 
query:


&fq=institute_name("xyz" OR "sfsda" OR "sdfsaf")

You can increase the URL length (HTTP header buffer) that your servlet 
container will allow.  Exactly how to do this would depend on which 
container you're using to deploy Solr.


Going to POST is a better option.  The way to do that would depend on 
what program or Solr development API you're using to access Solr.  If 
you're using SolrJ in a Java program, it should do that automatically.


Thanks,
Shawn



Re: OR OR OR

2013-02-06 Thread Upayavira
Also, OR is the default, so you can improve on it with:

&fq=institute_name:("xyz" "sfsda" "sdfsaf")

Upayavira

On Wed, Feb 6, 2013, at 08:17 PM, Shawn Heisey wrote:
> On 2/6/2013 12:41 PM, anurag.jain wrote:
> > in my query there are many OR's now after 79 or 80 ORS it gives error that
> > url is very large.
> >
> >
> > http://xvz/solr/select?q=*:*&fq=institute_name:"xyz"; OR
> > institute_name:"sfsda" OR institute_name:"sdfsaf" ..
> >
> >
> > i found a solution that we can give query through POST. but i don't know how
> > ? can you please tell me how to do this. please reply. :( urgent
> 
> If you used this query format instead, your URL would be smaller.  Note 
> that the quotes are only required if you want to do an explicit phrase 
> query:
> 
> &fq=institute_name("xyz" OR "sfsda" OR "sdfsaf")
> 
> You can increase the URL length (HTTP header buffer) that your servlet 
> container will allow.  Exactly how to do this would depend on which 
> container you're using to deploy Solr.
> 
> Going to POST is a better option.  The way to do that would depend on 
> what program or Solr development API you're using to access Solr.  If 
> you're using SolrJ in a Java program, it should do that automatically.
> 
> Thanks,
> Shawn
> 


SolrCore#getIndexDir() contract change between 3.6 and 4.1?

2013-02-06 Thread Gregg Donovan
In the process of upgrading from 3.6 to 4.1, we've noticed that much of the
code we had that relied on the 3.6 behavior of SolrCore#getIndexDir() is
not working the same way.

In 3.6, SolrCore#getIndexDir() would get us the index directory read from
index.properties, if it existed, otherwise it would return dataDir +
"index/".  As of svn 1420992 [1], SolrCore#getIndexDir() just
returns dataDir + "index/" and does not take index.properties into account.

This has me wondering what the intended state of support for
index.properties is in 4.1. After reading the code for some of the relevant
components -- Core admin, HTTP Replication, etc. -- I'm somewhat confused.

--In CoreAdminHandler#handleUnloadAction(SolrQueryRequest,
SolrQueryResponse) if the deleteIndex flag is set to true, it calls
core.getDirectoryFactory().remove(core.getIndexDir()). If a value other
than index/ is set in index.properties, won't this delete the wrong
directory?

--In CoreAdminHandler#getIndexSize(SolrCore), the existence of
SolrCore#getIndexDir() is checked before SolrCore#getNewIndexDir(). If a
value other than index/ is set in index.properties, won't this return the
size of the wrong directory?

Seeing these two examples, I wondered if index.properties and the use of
directories other than /index/ was deprecated, but I see that
SnapPuller will create a new directory within  and update
index.properties to point to it in cases where isFullCopyNeeded=true.

Our current Solr 3.6 reindexing scheme works by modifying index.properties
to point to a new directory and then doing a core reload. I'm wondering if
this method is intended to be deprecated at this point, or if the SolrCloud
scenarios are just getting more attention and some bugs have slipped into
the older code paths. I can certainly appreciate that it's tough to make
the changes needed for SolrCloud while maintaining perfect compatibility in
pre-Cloud code paths. Would restoring the previous contact of
SolrCore#getIndexDir() break anything in SolrCloud?

Thanks!

--Gregg


Gregg Donovan
Senior Software Engineer, Etsy.com
gr...@etsy.com

[1]
http://svn.apache.org/viewvc?diff_format=h&view=revision&revision=1420992


Re: Updating data

2013-02-06 Thread Alexandre Rafalovitch
Solr Atomic update requires all fields to be stored. Were they?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Feb 6, 2013 at 2:35 PM, anurag.jain  wrote:

> Hi, Thanks for reply.
>
> But i was doing same thing but whenver i tried to update previous field
> automatically delete. :(
>
> I am not getting why ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Updating-data-tp4038492p4038833.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCore#getIndexDir() contract change between 3.6 and 4.1?

2013-02-06 Thread Mark Miller
I think you must be confused?

Solr 3.6 and before worked the same way - getIndexDir never read the props file 
- getNewIndexDir always did. This was added when replication was added as far 
as I remember.

The idea is that you use that when constructing new searcher/writers but you 
shouldn't need to in other cases - i think a comment exists somewhere to that 
affect.

In any case, none of the replication handler changes are SolrCloud specific - 
they are all replication handler specific. We don't ask anything special of the 
replication handler in SolrCloud mode, other than adding a force option that 
guarantees a replication.

- Mark

On Feb 6, 2013, at 4:23 PM, Gregg Donovan  wrote:

> In the process of upgrading from 3.6 to 4.1, we've noticed that much of the
> code we had that relied on the 3.6 behavior of SolrCore#getIndexDir() is
> not working the same way.
> 
> In 3.6, SolrCore#getIndexDir() would get us the index directory read from
> index.properties, if it existed, otherwise it would return dataDir +
> "index/".  As of svn 1420992 [1], SolrCore#getIndexDir() just
> returns dataDir + "index/" and does not take index.properties into account.
> 
> This has me wondering what the intended state of support for
> index.properties is in 4.1. After reading the code for some of the relevant
> components -- Core admin, HTTP Replication, etc. -- I'm somewhat confused.
> 
> --In CoreAdminHandler#handleUnloadAction(SolrQueryRequest,
> SolrQueryResponse) if the deleteIndex flag is set to true, it calls
> core.getDirectoryFactory().remove(core.getIndexDir()). If a value other
> than index/ is set in index.properties, won't this delete the wrong
> directory?
> 
> --In CoreAdminHandler#getIndexSize(SolrCore), the existence of
> SolrCore#getIndexDir() is checked before SolrCore#getNewIndexDir(). If a
> value other than index/ is set in index.properties, won't this return the
> size of the wrong directory?
> 
> Seeing these two examples, I wondered if index.properties and the use of
> directories other than /index/ was deprecated, but I see that
> SnapPuller will create a new directory within  and update
> index.properties to point to it in cases where isFullCopyNeeded=true.
> 
> Our current Solr 3.6 reindexing scheme works by modifying index.properties
> to point to a new directory and then doing a core reload. I'm wondering if
> this method is intended to be deprecated at this point, or if the SolrCloud
> scenarios are just getting more attention and some bugs have slipped into
> the older code paths. I can certainly appreciate that it's tough to make
> the changes needed for SolrCloud while maintaining perfect compatibility in
> pre-Cloud code paths. Would restoring the previous contact of
> SolrCore#getIndexDir() break anything in SolrCloud?
> 
> Thanks!
> 
> --Gregg
> 
> 
> Gregg Donovan
> Senior Software Engineer, Etsy.com
> gr...@etsy.com
> 
> [1]
> http://svn.apache.org/viewvc?diff_format=h&view=revision&revision=1420992



Re: SolrCore#getIndexDir() contract change between 3.6 and 4.1?

2013-02-06 Thread Mark Miller

On Feb 6, 2013, at 4:23 PM, Gregg Donovan  wrote:

> code we had that relied on the 3.6 behavior of SolrCore#getIndexDir() is
> not working the same way.

Can you be very specific about the different behavior that you are seeing? What 
exactly where you seeing and counting on and what are you seeing now? 

- Mark

Multi-select faceting is not working when facet fields are configured in default request handler.

2013-02-06 Thread manivanann
Hi solr-user,

   In my work i have to do multi facet select. we have already configured
facet fields  globally in default request handler(solrconfig.xml). For multi
facet select i have done the query with exclusion filter. But it's not
working. The following is my query.

http://192.168.101.141:8080/solr/select?q=digital+camera&rows=0&facet=on&fq={!tag=Br}Brands:canon&facet.field={!ex=Br}Brands

But if i try after removing all the facet fields from my request hander in
solrconfig.xml, then above query is working fine.

please can you give me a solution. This multi-select faceting will work with
the current implementation or i have to remove all the facet from my request
handler means dynamically i have to send the facet fields through query when
i do multi-select faceting.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-select-faceting-is-not-working-when-facet-fields-are-configured-in-default-request-handler-tp4038768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCore#getIndexDir() contract change between 3.6 and 4.1?

2013-02-06 Thread Gregg Donovan
Mark-

You're right that SolrCore#getIndexDir() did not directly read
index.properties in 3.6. In 3.6, it gets it indirectly from what is passed
to the constructor of SolrIndexSearcher. Here's SolrCore#getIndexDir() in
3.6:

  public String getIndexDir() {
synchronized (searcherLock) {
  if (_searcher == null)
return dataDir + "index/";
  SolrIndexSearcher searcher = _searcher.get();
  return searcher.getIndexDir() == null ? dataDir + "index/" :
searcher.getIndexDir();
}
  }

In 3.6 the only time I see a new SolrIndexSearcher created without the
results of SolrCore#getNewIndexDir() getting passed in somehow would be if
SolrCore#newSearcher(String, boolean) is called manually before any other
SolrIndexSearcher. Otherwise, it looks like getNewIndexDir() is getting
passed to new SolrIndexSearcher which is then reflected back
in SolrCore#getIndexDir().

So, in 3.6 we had been able to rely on SolrCore#getIndexDir() giving us
either the value the index referenced in index.properties OR dataDir +
"index/" if index.properties was missing. In 4.1, it always gives us
dataDir + "index/".

Here's the comment in 3.6 on SolrCore#getNewIndexDir() that I think you
were referring to. The comment is unchanged in 4.1:

  /**
   * Returns the indexdir as given in index.properties. If index.properties
exists in dataDir and
   * there is a property index available and it points to a valid
directory
   * in dataDir that is returned Else dataDir/index is returned. Only
called for creating new indexSearchers
   * and indexwriters. Use the getIndexDir() method to know the active
index directory
   *
   * @return the indexdir as given in index.properties
   */
  public String getNewIndexDir() {

*"Use the getIndexDir() method to know the active index directory"* is the
behavior that we were reliant on. Since it's now hardcoded to dataDir +
"index/", it doesn't always return the active index directory.

--Gregg

On Wed, Feb 6, 2013 at 5:13 PM, Mark Miller  wrote:

>
> On Feb 6, 2013, at 4:23 PM, Gregg Donovan  wrote:
>
> > code we had that relied on the 3.6 behavior of SolrCore#getIndexDir() is
> > not working the same way.
>
> Can you be very specific about the different behavior that you are seeing?
> What exactly where you seeing and counting on and what are you seeing now?
>
> - Mark


Re: copyField vs single field

2013-02-06 Thread Otis Gospodnetic
The latter,  I believe,  but you lose the ability to give different weights
to matches on different fields.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Feb 6, 2013 2:34 PM, "adm1n"  wrote:

> Hi,
>
> Let's assume I have to search for a string (textField) in 6-7 different
> fields (username, firstname, lastname, etc). Which one will have better
> performance:
> username:test OR firstname:test OR lastname:test
> or defining some copyField and searching within it like somecopyfield:test
>
>
> thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/copyField-vs-single-field-tp4038832.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: copyField vs single field

2013-02-06 Thread Jack Krupansky
It is difficult to say for sure - unless somebody actually does a lot of 
benchmarking tests with various distributions of data in the fields and 
various field types (e.g., some are strings and some are text, and the 
cardinality of the string values.) I would suspect that the two would be 
roughly equivalent. I mean, if you search each field separately, that field 
has only its subset of the data, and the copy field has essentially the sum 
of the per-field subsets.


I would say that you should go with edismax "dismax" search (qf = list of 
fields and boosts) unless you have a clear reason to go the other way.


-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Wednesday, February 06, 2013 8:04 PM
To: solr-user@lucene.apache.org
Subject: Re: copyField vs single field

The latter,  I believe,  but you lose the ability to give different weights
to matches on different fields.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Feb 6, 2013 2:34 PM, "adm1n"  wrote:


Hi,

Let's assume I have to search for a string (textField) in 6-7 different
fields (username, firstname, lastname, etc). Which one will have better
performance:
username:test OR firstname:test OR lastname:test
or defining some copyField and searching within it like somecopyfield:test


thanks.



--
View this message in context:
http://lucene.472066.n3.nabble.com/copyField-vs-single-field-tp4038832.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: SolrCore#getIndexDir() contract change between 3.6 and 4.1?

2013-02-06 Thread Mark Miller
Thanks Gregg - can you file a JIRA issue?

- Mark

On Feb 6, 2013, at 5:57 PM, Gregg Donovan  wrote:

> Mark-
> 
> You're right that SolrCore#getIndexDir() did not directly read
> index.properties in 3.6. In 3.6, it gets it indirectly from what is passed
> to the constructor of SolrIndexSearcher. Here's SolrCore#getIndexDir() in
> 3.6:
> 
>  public String getIndexDir() {
>synchronized (searcherLock) {
>  if (_searcher == null)
>return dataDir + "index/";
>  SolrIndexSearcher searcher = _searcher.get();
>  return searcher.getIndexDir() == null ? dataDir + "index/" :
> searcher.getIndexDir();
>}
>  }
> 
> In 3.6 the only time I see a new SolrIndexSearcher created without the
> results of SolrCore#getNewIndexDir() getting passed in somehow would be if
> SolrCore#newSearcher(String, boolean) is called manually before any other
> SolrIndexSearcher. Otherwise, it looks like getNewIndexDir() is getting
> passed to new SolrIndexSearcher which is then reflected back
> in SolrCore#getIndexDir().
> 
> So, in 3.6 we had been able to rely on SolrCore#getIndexDir() giving us
> either the value the index referenced in index.properties OR dataDir +
> "index/" if index.properties was missing. In 4.1, it always gives us
> dataDir + "index/".
> 
> Here's the comment in 3.6 on SolrCore#getNewIndexDir() that I think you
> were referring to. The comment is unchanged in 4.1:
> 
>  /**
>   * Returns the indexdir as given in index.properties. If index.properties
> exists in dataDir and
>   * there is a property index available and it points to a valid
> directory
>   * in dataDir that is returned Else dataDir/index is returned. Only
> called for creating new indexSearchers
>   * and indexwriters. Use the getIndexDir() method to know the active
> index directory
>   *
>   * @return the indexdir as given in index.properties
>   */
>  public String getNewIndexDir() {
> 
> *"Use the getIndexDir() method to know the active index directory"* is the
> behavior that we were reliant on. Since it's now hardcoded to dataDir +
> "index/", it doesn't always return the active index directory.
> 
> --Gregg
> 
> On Wed, Feb 6, 2013 at 5:13 PM, Mark Miller  wrote:
> 
>> 
>> On Feb 6, 2013, at 4:23 PM, Gregg Donovan  wrote:
>> 
>>> code we had that relied on the 3.6 behavior of SolrCore#getIndexDir() is
>>> not working the same way.
>> 
>> Can you be very specific about the different behavior that you are seeing?
>> What exactly where you seeing and counting on and what are you seeing now?
>> 
>> - Mark



Re: What is the graceful shutdown API for Solrj embedded?

2013-02-06 Thread Shawn Heisey

On 2/6/2013 8:05 PM, Alexandre Rafalovitch wrote:

Hello,

When I CTRL-C the example Solr, it prints a bunch of graceful shutdown
messages.  I assume it shuts down safe and without corruption issues.

When I do that to Solrj (embedded, not remote), it just drops dead.

I found CoreContainer.shutdown(), which looks about right and does
terminate Solrj but it prints out a completely different set of messages.

Is CoreContainer.shutdown() the right method for Solrj (4.1)? Is there more
than just one call?

And what happens if you just Ctrl-C Solrj instance? Wiki says nothing about
shutdown, so I can imagine a lot of people probably think it is ok to just
kill it. Is there a danger of corruption?


I don't know the proper way to shut things down, but 
CoreContainer.shutdown() is probably part of it.  I can give you some 
information about your Ctrl-C observations, though.


When you interrupt the example Solr, you're interrupting Jetty, not 
Solr.  Jetty is a battle-tested servlet container that implements a very 
extensive shutdown hook.  As a servlet in a servlet container, Solr very 
likely interfaces into that shutdown hook.


When you interrupt SolrJ with EmbeddedSolrServer, that's an application 
that you wrote.  If you haven't implemented a shutdown hook, then the 
application will simply die, and it's possible you could encounter data 
loss.  You'll have to implement a shutdown hook that closes everything 
properly.


Some caveats I've learned about shutdown hooks: 1) Don't call 
System.exit() from within your hook thread.  This creates infinite 
recursion.  2) If you can't be sure that all the threads in your app 
have stopped by the time your hook thread ends, you'll have halt the JVM 
to ensure that the program actually exits.


http://docs.oracle.com/javase/6/docs/api/java/lang/Runtime.html#halt%28int%29

Thanks,
Shawn



Re: Adding replacement node to 4.x cluster with down node

2013-02-06 Thread adfel70
Why would "other nodes obviously won't be able to talk to this tmp node"?
Can you elaborate?


Mark Miller-3 wrote
> https://issues.apache.org/jira/browse/SOLR-4078 will be useful - that
> should make it in 4.3.
> 
> Until then, you want to get that node out, and you need the new node to be
> assigned to the same shard.
> 
> I guess I might try:
> 
> Add your new node - explicitly tell it what shard to join by setting the
> shard id.
> 
> Start a new tmp node and set the host value to the old node that is gone.
> This will work, though other nodes obviously won't be able to talk to this
> tmp node - it will think it's the missing node though. Now send a core
> unload command to it - that should cleanly remove it from the cluster.
> Stop/remove the tmp node. 
> 
> 
> - Mark
> 
> 
> On Feb 5, 2013, at 12:22 PM, Mike Schultz <

> mike.schultz@

> > wrote:
> 
>> Just to clarify,  I want to be able to replace the down node with a host
>> with
>> a different name.  If I were repairing that particular machine and
>> replacing
>> it, there would be no problem.  But I don't have control over the name of
>> my
>> replacement machine.
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Adding-replacement-node-to-4-x-cluster-with-down-node-tp4038469p4038612.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-replacement-node-to-4-x-cluster-with-down-node-tp4038469p4038925.html
Sent from the Solr - User mailing list archive at Nabble.com.