Re: MergingSolrIndexes not supported by SolrCloud?why?

2014-04-01 Thread rulinma
I make it. I make a mistake.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MergingSolrIndexes-not-supported-by-SolrCloud-why-tp4127111p4128351.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-04-01 Thread Salman Akram
So you too never got any response...


On Mon, Mar 31, 2014 at 6:57 PM, Luis Lebolo  wrote:

> Hi Salman,
>
> I was interested in something similar, take a look at the following thread:
>
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201401.mbox/%3CCADSoL-i04aYrsOo2%3DGcaFqsQ3mViF%2Bhn24ArDtT%3D7kpALtVHzA%40mail.gmail.com%3E#archives
>
> I never followed through, however.
>
> -Luis
>
>
> On Mon, Mar 31, 2014 at 6:24 AM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
> > Anyone?
> >
> >
> > On Wed, Mar 26, 2014 at 7:55 PM, Salman Akram <
> > salman.ak...@northbaysolutions.net> wrote:
> >
> > > With reference to this thread<
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E
> >I
> > wanted to know if there was any response to that or if Chris Harris
> > > himself can comment on what he ended up doing, that would be great!
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Salman Akram
> > >
> > >
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
>



-- 
Regards,

Salman Akram


transaction log size

2014-04-01 Thread Gurfan
Hi,

As Transaction log(Tlog) play important role while restarting the SolrCloud
cluster, we are trying to decrease the size. Many of the posts on net which
we find describing that -
"decreasing the AutoCommit and increasing autoSoftCommit would generate the
small size of transaction log".

To test the aforesaid statement we executed some Run:

Document Size: ~2KB.

1st Run:

AutoCommit: 30 Sec
autoSoftCommit: 20 Sec
openSearcher:  false
Index size: 4.7 GB
Transaction log:
   Master: 740KB
   Slave: 86 MB

2nd Run:

AutoCommit: 20 Sec
autoSoftCommit: 30 Sec
openSearcher:  false
Index size: 4.7 GB
Transaction log:
   Master: 740KB
   Slave: 202 MB


PFA zip containing cluster(master, slave) transaction logs disk usage at the
interval of 1 Min.

transactionLog.zip
  

> Schema.xml
> Solr-config.xml
> transactionLog/master/*
> transactionLog/slave/*



May you please give us some pointer so that we can control on Transaction
log(Tlog) generation.

Thanks,
--Gurfan



--
View this message in context: 
http://lucene.472066.n3.nabble.com/transaction-log-size-tp4128354.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread elisabeth benoit
Thanks a lot for your answers!

Shawn. Our GC configuration has far less parameters defined, so we'll check
this out.

Dimitry, about the expungeDeletes option, we'll add that in the delete
process. But from what I read, this is done in the optimize process (cf.
http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html).
Or maybe not?

Thanks again,
Elisabeth


2014-04-01 7:52 GMT+02:00 Dmitry Kan :

> Hi,
>
> We have noticed something like this as well, but with older versions of
> solr, 3.4. In our setup we delete documents pretty often. Internally in
> Lucene, when a document is client requested to be deleted, it is not
> physically deleted, but only marked as "deleted". Our original optimization
> assumption was such that the "deleted" documents would get physically
> removed on each optimize command issued. We started to suspect it wasn't
> always true as the shards (especially relatively large shards) became
> slower over time. So we found out about the expungeDeletes option, which
> purges the "deleted" docs and is by default false. We have set it to true.
> If your solr update lifecycle includes frequent deletes, try this out.
>
> This of course does not override working towards finding better
> GCparameters.
>
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
>
>
> On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Hello,
> >
> > We are currently using solr 4.2.1. Our index is updated on a daily basis.
> > After noticing solr query time has increased (two times the initial size)
> > without any change in index size or in solr configuration, we tried an
> > optimize on the index but it didn't fix our problem. We checked the
> garbage
> > collector, but everything seemed fine. What did in fact fix our problem
> was
> > to delete all documents and reindex from scratch.
> >
> > It looks like over time our index gets "corrupted" and optimize doesn't
> fix
> > it. Does anyone have a clue how to investigate further this situation?
> >
> >
> > Elisabeth
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>


Asp.net MVC 4 and Solr Query Begining

2014-04-01 Thread danielkrudolf
Hello to all, I am new in Solr, but I see it si very usable. So I whant to
build web application with asp.net MVC 4 that shows query from Solr. 

Ok, let's go so far i have done this:

1) Open new project in Visual Studio 2012 and create new MVC 4 project
(Empty solution).

2) In Package Manager Console I have Install next packages:
/SolrNet (core library)
SolrNet.Windsor
SolrNet.StructureMap
SolrNet.Ninject
SolrNet.Unity
SolrNet.Autofac
SolrNet.NHibernate/

3) I create new Model class with name "Poskus" and write this code:
using SolrNet.Attributes;

/namespace TestSolr2.Models
{
public class Poskus
{

[SolrField("customer")]
public string Customer { get; set; }

}
}/

4) Next I have create new Controller with name "PoskusSolr" and write this
code:

/using System;
using System.Web.Mvc;
using SolrNet;
using SolrNet.DSL;
using TestSolr.Models;
using SolrNet.Impl;

namespace TestSolr.Controllers
{
public class PoskusSolrController : Controller
{
//
// GET: /PoskusSolr/

public ActionResult Index()
{

try
{
var connection = new
SolrConnection("http://servicemix/.../msglog_pilot";);
Startup.Init(connection);

var pos1 = Solr.Query(new SolrQueryByField("name",
"customer"));

return View(pos1);
}
catch (Exception ex)
{
string error = ex.Message;
}
return View();
}   
}
}/

5) And finaly I have create View (Index) 

So far I put this:

/@model TestSolr.Models.Poskus
@{
ViewBag.Title = "Index";
}

Index
/

-
Now, this code does not work, it not returns customers from my Solr databse.
Any sugestions, ideas, links to make this work. Realy, realy thanks for
help.

Daniel




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Asp-net-MVC-4-and-Solr-Query-Begining-tp4128372.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ranking retrieval measure

2014-04-01 Thread Rahul Singh
one of the measurement criteria is DCG.
http://en.wikipedia.org/wiki/Discounted_cumulative_gain



On Tue, Apr 1, 2014 at 11:44 AM, Floyd Wu  wrote:

> Usually IR system is measured using Precision & Recall.
> But depends on what kind of system you are developing to fit what scenario.
>
> Take a look
> http://en.wikipedia.org/wiki/Precision_and_recall
>
>
>
> 2014-04-01 10:23 GMT+08:00 azhar2007 :
>
> > Hi people. Ive developed a search engine to implement and improve it
> using
> > another search engine as a test case. Now I want to compare and test
> > results
> > from both to determine which is better. I am unaware of how to do this so
> > someone please point me in the right direction.
> >
> > Regards
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/ranking-retrieval-measure-tp4128324.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Asp.net MVC 4 and Solr Query Begining

2014-04-01 Thread Nazik
Hi Daniel,

I think you should post this to SOLR.NET google group:

https://groups.google.com/forum/m/#!forum/solrnet

That forum is more appropriate to address  this type of problem.

@Nazik_Huq 


On Apr 1, 2014, at 5:30 AM, danielkrudolf  wrote:

> Hello to all, I am new in Solr, but I see it si very usable. So I whant to
> build web application with asp.net MVC 4 that shows query from Solr. 
> 
> Ok, let's go so far i have done this:
> 
> 1) Open new project in Visual Studio 2012 and create new MVC 4 project
> (Empty solution).
> 
> 2) In Package Manager Console I have Install next packages:
> /SolrNet (core library)
> SolrNet.Windsor
> SolrNet.StructureMap
> SolrNet.Ninject
> SolrNet.Unity
> SolrNet.Autofac
> SolrNet.NHibernate/
> 
> 3) I create new Model class with name "Poskus" and write this code:
> using SolrNet.Attributes;
> 
> /namespace TestSolr2.Models
> {
>public class Poskus
>{
> 
>[SolrField("customer")]
>public string Customer { get; set; }
> 
>}
> }/
> 
> 4) Next I have create new Controller with name "PoskusSolr" and write this
> code:
> 
> /using System;
> using System.Web.Mvc;
> using SolrNet;
> using SolrNet.DSL;
> using TestSolr.Models;
> using SolrNet.Impl;
> 
> namespace TestSolr.Controllers
> {
>public class PoskusSolrController : Controller
>{
>//
>// GET: /PoskusSolr/
> 
>public ActionResult Index()
>{
> 
>try
>{
>var connection = new
> SolrConnection("http://servicemix/.../msglog_pilot";);
>Startup.Init(connection);
> 
>var pos1 = Solr.Query(new SolrQueryByField("name",
> "customer"));
> 
>return View(pos1);
>}
>catch (Exception ex)
>{
>string error = ex.Message;
>}
>return View();
>}   
>}
> }/
> 
> 5) And finaly I have create View (Index) 
> 
> So far I put this:
> 
> /@model TestSolr.Models.Poskus
> @{
>ViewBag.Title = "Index";
> }
> 
> Index
> /
> 
> -
> Now, this code does not work, it not returns customers from my Solr databse.
> Any sugestions, ideas, links to make this work. Realy, realy thanks for
> help.
> 
> Daniel
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Asp-net-MVC-4-and-Solr-Query-Begining-tp4128372.html
> Sent from the Solr - User mailing list archive at Nabble.com.


sort by an attribute values sequence

2014-04-01 Thread santosh sidnal
Hi All,

We have a specific requirement of sorting the products as per a specific
attribute value sequence. Any pointer or source of info would help us.

Example of the scenario;

Let's say for search result i want to sort results based on a attribute
producttype. Where producttype has following values, A, B, C, D.

so while in solr query i can give either producttype asc, producttype desc.

But I want get result in a specific way by saying first give me All results
of values 'C' then B, A, D.


-- 
Regards,
Santosh Sidnal


Re: Asp.net MVC 4 and Solr Query Begining

2014-04-01 Thread danielkrudolf
Nazik thanks for the help, is there similar forums, this one seems that is
not working, I can't post new subject or question. 

Thanks for help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Asp-net-MVC-4-and-Solr-Query-Begining-tp4128372p4128390.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Block until replication finishes

2014-04-01 Thread Fermin Silva
The ReplicationHandler class is not the most exemplar code to be looking at.
I found however the line that could be changed:

new Thread() {
@Override
public void run() {
  doFetch(paramsCopy);
}
  }.start();
rsp.add(STATUS, OK_STATUS);

It should be really simple to join on that thread depending on a rest
parameter.
I would change that code myself (which I did to my custom SOLR
installation) but I guess the fix should go for SOLR 4.x and not 3.x.
Sorry but I have no clue about how to contribute with code. Will check that
but if someone can point me to the right direction it would be nice.

Thanks


On Sat, Mar 29, 2014 at 9:49 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello,
> We did this for our fork, if you are not happy with "RESTful polling", or
> think that the synchronous replication handler might be useful, please
> raise a jira.
>  27.03.2014 17:35 пользователь "Fermin Silva"  написал:
>
> > Hi,
> >
> > we are moving to native replication with SOLR 3.5.1.
> > Because we want to control the replication from another program (a cron
> > job), we decided to curl the slave to issue a fetchIndex command.
> >
> > The problem we have is that the curl returns immediately, while the
> > replication still goes in the background.
> > We need to know when the replication is done, and then resume the cron
> job.
> >
> > Is there a way to block on the replication call until it's done similar
> to
> > waitForSearcher=true when committing ?
> > If not, what other possibilities we have?
> >
> > Just in case, here is the solrconfig part in the slave (we pass masterUrl
> > in the curl url)
> >
> > 
> > 
> >   
> > 
> >   
> >
> >
> > Many thanks in advance
> >
> > --
> > Fermin Silva
> >
>



-- 
Fermin Silva
Speed & Scalability Team


High CPU usage after import

2014-04-01 Thread Александр Вандышев

I use a update/extract handler for indexing a large number of files. If during
indexing a CPU loads was not maximum at the end of import loading decreases. If
CPU loading was max then loading remain high. Who can help me?


Re: Block until replication finishes

2014-04-01 Thread Fermin Silva
When trying to add the fix to the trunk version, I found that this was
already implemented.
There is a parameter '*wait*' that does exactly that.

if (solrParams.getBool(WAIT, false))
{
puller.join();
}

So the only possible way to do this in SOLR 3.x is to create a plugin with
a new replication handler (which I did) or re-compile SOLR.


On Tue, Apr 1, 2014 at 10:02 AM, Fermin Silva  wrote:

> The ReplicationHandler class is not the most exemplar code to be looking
> at.
> I found however the line that could be changed:
>
> new Thread() {
> @Override
> public void run() {
>   doFetch(paramsCopy);
> }
>   }.start();
> rsp.add(STATUS, OK_STATUS);
>
> It should be really simple to join on that thread depending on a rest
> parameter.
> I would change that code myself (which I did to my custom SOLR
> installation) but I guess the fix should go for SOLR 4.x and not 3.x.
> Sorry but I have no clue about how to contribute with code. Will check
> that but if someone can point me to the right direction it would be nice.
>
> Thanks
>
>
> On Sat, Mar 29, 2014 at 9:49 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
>> Hello,
>> We did this for our fork, if you are not happy with "RESTful polling", or
>> think that the synchronous replication handler might be useful, please
>> raise a jira.
>>  27.03.2014 17:35 пользователь "Fermin Silva"  написал:
>>
>> > Hi,
>> >
>> > we are moving to native replication with SOLR 3.5.1.
>> > Because we want to control the replication from another program (a cron
>> > job), we decided to curl the slave to issue a fetchIndex command.
>> >
>> > The problem we have is that the curl returns immediately, while the
>> > replication still goes in the background.
>> > We need to know when the replication is done, and then resume the cron
>> job.
>> >
>> > Is there a way to block on the replication call until it's done similar
>> to
>> > waitForSearcher=true when committing ?
>> > If not, what other possibilities we have?
>> >
>> > Just in case, here is the solrconfig part in the slave (we pass
>> masterUrl
>> > in the curl url)
>> >
>> > 
>> > 
>> >   
>> > 
>> >   
>> >
>> >
>> > Many thanks in advance
>> >
>> > --
>> > Fermin Silva
>> >
>>
>
>
>
> --
> Fermin Silva
> Speed & Scalability Team
>



-- 
Fermin Silva
Speed & Scalability Team


Re: transaction log size

2014-04-01 Thread Shawn Heisey
On 4/1/2014 1:23 AM, Gurfan wrote:
> Hi,
> 
> As Transaction log(Tlog) play important role while restarting the SolrCloud
> cluster, we are trying to decrease the size. Many of the posts on net which
> we find describing that -
> "decreasing the AutoCommit and increasing autoSoftCommit would generate the
> small size of transaction log".

Transaction log size is purely controlled by hard commits (autoCommit),
soft commits have no influence at all.

> To test the aforesaid statement we executed some Run:
> 
> Document Size: ~2KB.
> 
> 1st Run:
> 
> AutoCommit: 30 Sec
> autoSoftCommit: 20 Sec
> openSearcher:  false
> Index size: 4.7 GB
> Transaction log:
>Master: 740KB
>Slave: 86 MB
> 
> 2nd Run:
> 
> AutoCommit: 20 Sec
> autoSoftCommit: 30 Sec
> openSearcher:  false
> Index size: 4.7 GB
> Transaction log:
>Master: 740KB
>Slave: 202 MB

When you say master and slave, are you using old-style replication, or
are you using SolrCloud?

With old-style replication, the slave should not be indexing *anything*
-- the index itself is copied from the master to the slave.  I don't
know whether transaction logs are copied by replication, but I suspect
that they are not.  If they are not, the slave should not have ANY
transaction logs.  If they are, the slave should be identical.  You
should be OK to delete the slave transaction logs.  It's entirely
possible that there is a bug.

With SolrCloud, master and slave have no meaning -- each shard has
replicas, and one of the replicas is elected to be leader.  An election
can happen at any time in response to cluster events, and a different
replica might be elected leader.

Although replication is required for SolrCloud operation, it is not used
except at node startup and if something goes wrong that requires index
recovery.  Each node does its own indexing and will manage its own
transaction logs according to how frequently you do a hard commit.

Thanks,
Shawn



Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Dmitry Kan
Elisabeth,

Yes, I believe you are right in that the deletes are part of the optimize
process. If you delete often, you may consider (if not already) the
TieredMergePolicy, which is suited for this scenario. Check out this
relevant discussion I had with Lucene committers:
https://twitter.com/DmitryKan/status/399820408444051456

HTH,

Dmitry


On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit  wrote:

> Thanks a lot for your answers!
>
> Shawn. Our GC configuration has far less parameters defined, so we'll check
> this out.
>
> Dimitry, about the expungeDeletes option, we'll add that in the delete
> process. But from what I read, this is done in the optimize process (cf.
>
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> ).
> Or maybe not?
>
> Thanks again,
> Elisabeth
>
>
> 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
>
> > Hi,
> >
> > We have noticed something like this as well, but with older versions of
> > solr, 3.4. In our setup we delete documents pretty often. Internally in
> > Lucene, when a document is client requested to be deleted, it is not
> > physically deleted, but only marked as "deleted". Our original
> optimization
> > assumption was such that the "deleted" documents would get physically
> > removed on each optimize command issued. We started to suspect it wasn't
> > always true as the shards (especially relatively large shards) became
> > slower over time. So we found out about the expungeDeletes option, which
> > purges the "deleted" docs and is by default false. We have set it to
> true.
> > If your solr update lifecycle includes frequent deletes, try this out.
> >
> > This of course does not override working towards finding better
> > GCparameters.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> >
> >
> > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > elisaelisael...@gmail.com
> > > wrote:
> >
> > > Hello,
> > >
> > > We are currently using solr 4.2.1. Our index is updated on a daily
> basis.
> > > After noticing solr query time has increased (two times the initial
> size)
> > > without any change in index size or in solr configuration, we tried an
> > > optimize on the index but it didn't fix our problem. We checked the
> > garbage
> > > collector, but everything seemed fine. What did in fact fix our problem
> > was
> > > to delete all documents and reindex from scratch.
> > >
> > > It looks like over time our index gets "corrupted" and optimize doesn't
> > fix
> > > it. Does anyone have a clue how to investigate further this situation?
> > >
> > >
> > > Elisabeth
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


RE: Update single field through SolrJ

2014-04-01 Thread Jean-Sebastien Vachon
Hi,

Thanks for pointing me in the proper direction. I managed to change my code to 
send atomic updates through SolrJ but this morning we experienced something 
weird. I sent a large batch of updates and deletes through SolrJ and our Cloud 
quickly became unusable and unresponsive (no leader for a shard, etc).

We looked through the logs and could not find a particular reason for this. We 
waited quite some time but some nodes were not showing any progress in their 
recovery so we restarted them (we are running Tomcat 7.0.39) and everything 
came back as if nothing happened.

Does anyone experienced something similar? We are currently running Solr 4.6.1 
on a 5 nodes cluster with both ZK 3.4.5 and Solr on them (ZK has its own 
storage device to minimize the impact). Both are also running under JRE 
1.7.0_21 in 64 bits mode. Our index has 5 shards with 2 replicas.

Thanks for your help

> -Original Message-
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: March-28-14 3:21 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Update single field through SolrJ
> 
> On 3/28/2014 1:02 PM, Jean-Sebastien Vachon wrote:
> > I`d like to know how (it is possible) to update a field`s value using 
> > SolrJ. I
> looked at the API and could not figure it out so for now I'm using the
> UpdateHandler by sending it a JSON formatted document illustrating the
> required changes.
> >
> >
> > Is there a way to do the same through SolrJ?
> 
> The feature you are after is called Atomic Updates.  In order to use this
> feature *all* of your fields must be stored, except for copyField 
> destinations.
> See especially the "Caveats and Limitations" section of the first link below:
> 
> http://wiki.apache.org/solr/Atomic_Updates
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Docu
> ments
> 
> To do this with SolrJ, you must use a Map for the field value instead of just
> one or more regular values:
> 
> http://stackoverflow.com/questions/16234045/solr-how-to-use-the-new-
> field-update-modes-atomic-updates-with-solrj
> 
> Thanks,
> Shawn
> 
> 
> -
> Aucun virus trouvé dans ce message.
> Analyse effectuée par AVG - www.avg.fr
> Version: 2014.0.4354 / Base de données virale: 3722/7256 - Date:
> 27/03/2014


Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-04-01 Thread Luis Lebolo
I got responses, but no easy solution to allow me to directly cancel a
request. The responses did point to:

   - timeAllowed query parameter that returns partial results -
   
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter
   - A possible hack that I never followed through -
   
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201401.mbox/%3CCANGii8eaSouePGxa7JfvOBhrnJUL++Ct4rQha2pxMefvaWhH=g...@mail.gmail.com%3E

Maybe one of those will help you? If they do, make sure to report back!

-Luis


On Tue, Apr 1, 2014 at 3:13 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> So you too never got any response...
>
>
> On Mon, Mar 31, 2014 at 6:57 PM, Luis Lebolo 
> wrote:
>
> > Hi Salman,
> >
> > I was interested in something similar, take a look at the following
> thread:
> >
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201401.mbox/%3CCADSoL-i04aYrsOo2%3DGcaFqsQ3mViF%2Bhn24ArDtT%3D7kpALtVHzA%40mail.gmail.com%3E#archives
> >
> > I never followed through, however.
> >
> > -Luis
> >
> >
> > On Mon, Mar 31, 2014 at 6:24 AM, Salman Akram <
> > salman.ak...@northbaysolutions.net> wrote:
> >
> > > Anyone?
> > >
> > >
> > > On Wed, Mar 26, 2014 at 7:55 PM, Salman Akram <
> > > salman.ak...@northbaysolutions.net> wrote:
> > >
> > > > With reference to this thread<
> > >
> >
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E
> > >I
> > > wanted to know if there was any response to that or if Chris Harris
> > > > himself can comment on what he ended up doing, that would be great!
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Salman Akram
> > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Salman Akram
> > >
> >
>
>
>
> --
> Regards,
>
> Salman Akram
>


Re: High CPU usage after import

2014-04-01 Thread Jack Krupansky
Some document types can consume significant CPU resources, such as large PDF 
files.


-- Jack Krupansky

-Original Message- 
From: Александр Вандышев

Sent: Tuesday, April 1, 2014 9:28 AM
To: Solr User
Subject: High CPU usage after import

I use a update/extract handler for indexing a large number of files. If 
during
indexing a CPU loads was not maximum at the end of import loading decreases. 
If
CPU loading was max then loading remain high. Who can help me? 



Re: Please help: Problems adding a document to the solr collection

2014-04-01 Thread Silvia Suárez
Thanks for your answer Alexandre!

S.

Silvia Suárez Barón
I+D+I

972 989 470  / s...@anpro21.com /   

  
  


*Tecnologías y SaaS para el análisis de marcas comerciales.*


Nota:
Usted ha recibido este mensaje al estar en la libreta de direcciones del
remitente, en los archivos de la empresa o mediante el sistema de
"responder" al ser usted la persona que contactó por este medio con el
remitente. En caso de no querer recibir ningún email mas del remitente o de
cualquier miembro de la organización a la que pertenece, por favor,
responda a este email solicitando la baja de su dirección en nuestros
archivos.

Advertencia legal:
Este mensaje y, en su caso, los ficheros anexos son confidenciales,
especialmente en lo que respecta a los datos personales, y se dirigen
exclusivamente al destinatario referenciado. Si usted no lo es y lo ha
recibido por error o tiene conocimiento del mismo por cualquier motivo, le
rogamos que nos lo comunique por este medio y proceda a destruirlo o
borrarlo, y que en todo caso se abstenga de utilizar, reproducir, alterar,
archivar o comunicar a terceros el presente mensaje y ficheros anexos, todo
ello bajo pena de incurrir en responsabilidades legales.


2014-03-28 14:14 GMT+01:00 Alexandre Rafalovitch :

> Grep for uuid in config directory. It's probably in solrconfig.xml
> especially if you have dedup chain.
>
> Regards,
>  Alex
> On 28/03/2014 7:52 pm, "Silvia Suárez"  wrote:
>
> > Dear all:
> >
> > I'm trying to add a solr document into the solr collection. The code
> that I
> > am using is like this:
> >
> > public static void addDocuments (HttpSolrServer serverCore2)
> >  throws SolrServerException, IOException {
> >  SolrInputDocument doc1 = new SolrInputDocument();
> > doc1.addField( "c_noticia", "id1", 1.0f );
> > doc1.addField( "c_tipo", 1, 1.0f );
> > doc1.addField( "c_perfil", 10 );
> > serverCore2.add( doc1 );
> > serverCore2.commit();
> >
> > }
> >
> > However, I get next error when i execute my program:
> >
> > Exception in thread "main"
> > org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
> > Document is missing mandatory uniqueKey field: uuid at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
> > at
> >
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
> > at
> >
> >
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at
> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) at
> > SearchLucene.addDocuments(SearchLucene.java:1047) at
> > SearchLucene.printOr(SearchLucene.java:1030) at
> > SearchLucene.SearchLuciernaga(SearchLucene.java:437) at
> > GetResults.main(GetResults.java:482)
> >
> >
> > My schema.xml file is like this:
> >
> >   > required
> > ="true" multiValued="false" />  > "true" stored="true" multiValued="true"/>  type="int"
> > indexed="true" stored="true" multiValued="true"/>
> >
> > And the uniqueKey is like this:
> >
> > c_noticia
> >
> >
> > I don't understand what is the problem here.
> >
> > My uniqueKey is: c_noticia, it is not a uuid field.
> >
> >
> > Thanks a lot for some help in advance,
> >
> > Silvia.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2014-03-28 11:47 GMT+01:00 Alan Woodward :
> >
> > > Hi all,
> > >
> > > I have a few of questions about the context-aware
> > AnalyzingInfixSuggester:
> > > - is it possible to choose a specific field for the context at runtime
> > > (say, I want to limit suggestions by a field that I've already faceted
> > on),
> > > or is it limited to the hardcoded CONTEXTS_FIELD_NAME?
> > > - is the context-aware functionality exposed to Solr yet?
> > > - how difficult would it be to add similar functionality to the other
> > > suggesters, if say I only wanted to do prefix matching?
> > >
> > > Thanks,
> > >
> > > Alan Woodward
> > > www.flax.co.uk
> > >
> > >
> > >
> >
>


How to add a map of key/value pairs into a solr schema?

2014-04-01 Thread Silvia Suárez
Dear all,

I'm trying to add a map of key/value pairs into the solr schema, and I just
wordering if it is possible.

For instance:

This is my schema.xml :

 
 
 
 
 


Is it possible to define a type= map (see the example above in the schema)
into the solr xchema?, for example something like this:

map: 2252 / 23
 3789 / 12
 3790 / 21
 3794 / 19

And get a result like this:

 
62906367

  2252
  3789
  3790
  3794

  :
  :
  
  2252 / 23
  3789 / 54
  3790 / 21
  3794 / 12



I mean, is it possible introduce a map into one document?

Thanks in advance for some help,

Silvia.


Re: How to add a map of key/value pairs into a solr schema?

2014-04-01 Thread Jack Krupansky
Not directly. The various workarounds depend on how you intend to access and 
query the values. What are your use cases?


-- Jack Krupansky

-Original Message- 
From: Silvia Suárez

Sent: Tuesday, April 1, 2014 12:29 PM
To: solr-user@lucene.apache.org
Subject: How to add a map of key/value pairs into a solr schema?

Dear all,

I'm trying to add a map of key/value pairs into the solr schema, and I just
wordering if it is possible.

For instance:

This is my schema.xml :








Is it possible to define a type= map (see the example above in the schema)
into the solr xchema?, for example something like this:

map: 2252 / 23
3789 / 12
3790 / 21
3794 / 19

And get a result like this:


   62906367
   
 2252
 3789
 3790
 3794
   
 :
 :
 
 2252 / 23
 3789 / 54
 3790 / 21
 3794 / 12
   


I mean, is it possible introduce a map into one document?

Thanks in advance for some help,

Silvia. 



Re: zookeeper reconnect failure

2014-04-01 Thread Jessica Mallet
Will do Mark. Thanks!


On Sun, Mar 30, 2014 at 1:29 PM, Mark Miller  wrote:

> We don't currently retry, but I don't think it would hurt much if we did -
> at least briefly.
>
> If you want to file a JIRA issue, that would be the best way to get it in
> a future release.
>
> --
> Mark Miller
> about.me/markrmiller
>
> On March 28, 2014 at 5:40:47 PM, Michael Della Bitta (
> michael.della.bi...@appinions.com) wrote:
>
> Hi, Jessica,
>
> We've had a similar problem when DNS resolution of our Hadoop task nodes
> has failed. They tend to take a dirt nap until you fix the problem
> manually. Are you experiencing this in AWS as well?
>
> I'd say the two things to do are to poll the node state via HTTP using a
> monitoring tool so you get an immediate notification of the problem, and to
> install some sort of caching server like nscd if you expect to have DNS
> resolution failures regularly.
>
>
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> "The Science of Influence Marketing"
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com 
>
>
> On Fri, Mar 28, 2014 at 4:27 PM, Jessica Mallet  >wrote:
>
> > Hi,
> >
> > First off, I'd like to give a disclaimer that this probably is a very
> edge
> > case issue. However, since it happened to us, I would like to get some
> > advice on how to best handle this failure scenario.
> >
> > Basically, we had some network issue where we temporarily lost connection
> > and DNS. The zookeeper client properly triggered the watcher. However,
> when
> > trying to reconnect, this following Exception is thrown:
> >
> > 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line
> > 121) :java.net.UnknownHostException: : Name or
> > service not known
> > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
> > at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
> > at
> > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
> > at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
> > at java.net.InetAddress.getAllByName(InetAddress.java:1127)
> > at java.net.InetAddress.getAllByName(InetAddress.java:1063)
> > at
> >
> >
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60)
> > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
> > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
> > at
> > org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)
> > at
> >
> >
> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
> > at
> >
> >
> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
> > at
> >
> >
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> > at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> >
> > I tried to look at the code and it seems that there'd be no further
> retries
> > to connect to Zookeeper, and the node is basically left in a bad state
> and
> > will not recover on its own. (Please correct me if I'm reading this
> wrong.)
> > Thinking about it, this is probably fair, since normally you wouldn't
> > expect retries to fix an "unknown host" issue--even though in our case it
> > would have--but I'm wondering what we should do to handle this situation
> if
> > it happens again in the future.
> >
> > Any advice is appreciated.
> >
> > Thanks,
> > Jessica
> >
>


Re: zookeeper reconnect failure

2014-04-01 Thread Jessica Mallet
Filed: https://issues.apache.org/jira/browse/SOLR-5945


On Tue, Apr 1, 2014 at 11:10 AM, Jessica Mallet wrote:

> Will do Mark. Thanks!
>
>
> On Sun, Mar 30, 2014 at 1:29 PM, Mark Miller wrote:
>
>> We don't currently retry, but I don't think it would hurt much if we did
>> - at least briefly.
>>
>> If you want to file a JIRA issue, that would be the best way to get it in
>> a future release.
>>
>> --
>> Mark Miller
>> about.me/markrmiller
>>
>> On March 28, 2014 at 5:40:47 PM, Michael Della Bitta (
>> michael.della.bi...@appinions.com) wrote:
>>
>> Hi, Jessica,
>>
>> We've had a similar problem when DNS resolution of our Hadoop task nodes
>> has failed. They tend to take a dirt nap until you fix the problem
>> manually. Are you experiencing this in AWS as well?
>>
>> I'd say the two things to do are to poll the node state via HTTP using a
>> monitoring tool so you get an immediate notification of the problem, and
>> to
>> install some sort of caching server like nscd if you expect to have DNS
>> resolution failures regularly.
>>
>>
>>
>> Michael Della Bitta
>>
>> Applications Developer
>>
>> o: +1 646 532 3062
>>
>> appinions inc.
>>
>> "The Science of Influence Marketing"
>>
>> 18 East 41st Street
>>
>> New York, NY 10017
>>
>> t: @appinions  | g+:
>> plus.google.com/appinions<
>> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
>> >
>> w: appinions.com 
>>
>>
>> On Fri, Mar 28, 2014 at 4:27 PM, Jessica Mallet > >wrote:
>>
>> > Hi,
>> >
>> > First off, I'd like to give a disclaimer that this probably is a very
>> edge
>> > case issue. However, since it happened to us, I would like to get some
>> > advice on how to best handle this failure scenario.
>> >
>> > Basically, we had some network issue where we temporarily lost
>> connection
>> > and DNS. The zookeeper client properly triggered the watcher. However,
>> when
>> > trying to reconnect, this following Exception is thrown:
>> >
>> > 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java
>> (line
>> > 121) :java.net.UnknownHostException: : Name or
>> > service not known
>> > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
>> > at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)
>> > at
>> > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)
>> > at java.net.InetAddress.getAllByName0(InetAddress.java:1211)
>> > at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>> > at java.net.InetAddress.getAllByName(InetAddress.java:1063)
>> > at
>> >
>> >
>> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60)
>> > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
>> > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
>> > at
>> > org.apache.solr.common.cloud.SolrZooKeeper.(SolrZooKeeper.java:41)
>> > at
>> >
>> >
>> org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
>> > at
>> >
>> >
>> org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
>> > at
>> >
>> >
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>> > at
>> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> >
>> > I tried to look at the code and it seems that there'd be no further
>> retries
>> > to connect to Zookeeper, and the node is basically left in a bad state
>> and
>> > will not recover on its own. (Please correct me if I'm reading this
>> wrong.)
>> > Thinking about it, this is probably fair, since normally you wouldn't
>> > expect retries to fix an "unknown host" issue--even though in our case
>> it
>> > would have--but I'm wondering what we should do to handle this
>> situation if
>> > it happens again in the future.
>> >
>> > Any advice is appreciated.
>> >
>> > Thanks,
>> > Jessica
>> >
>>
>
>


omitNorms and very short text fields

2014-04-01 Thread Walter Underwood
Just double-checking my understanding of omitNorms.

For very short text fields like personal names or titles, length normalization 
can give odd results. For example, we might want these two to score the same 
for the query "Cinderella".

* Cinderella
* Cinderella (Diamond Edition) (Blu-ray + DVD + Digital Copy) (Widescreen)

And these two for the query "chuck":

* Chuck House
* Check E. Cheese

I think that omitNorm=true on those fields will give that behavior. Is that the 
right approach?

wunder
--
Walter Underwood
wun...@wunderwood.org





tf and very short text fields

2014-04-01 Thread Walter Underwood
And here is another peculiarity of short text fields.

The movie "New York, New York" should not be twice as relevant for the query 
"new york". Is there a way to use a binary term frequency rather than a count?

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Solr Seach proposal

2014-04-01 Thread ahmed shawki



Hi All,Hi Furkan and Ahmet,
Thanks for your reply to my last email about "Solr Search" proposal (sent on 
last Sunday, 30-Mar-2014).
This is just to announce "Solr Search", which is a simple HTML interface for 
searching documents which are indexed by Apache Solr (TM).
Actually, it was developed during the last two months (in spare time). So, this 
small HTML interface for Solr is far from a complete or a mature project.
But its features and options might be found useful by some users of Solr, that 
is why I am glad to share it here.
The code is hosted now at:https://code.google.com/p/solr-search-html/
In this page, a "quick overview" link about "Solr Search" can be found, as well 
as a link for downloading it.
Thanks
Best Regards,Ahmed shawkiasha...@hotmail.com

  

Re: tf and very short text fields

2014-04-01 Thread Markus Jelsma
Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with 
k1 set to zero in your schema.


Walter Underwood  schreef:And here is another 
peculiarity of short text fields.

The movie "New York, New York" should not be twice as relevant for the query 
"new york". Is there a way to use a binary term frequency rather than a count?

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: omitNorms and very short text fields

2014-04-01 Thread Markus Jelsma
Yes, that will work. And combined with your other question scores will always 
be equal even if cinderella or chuck occur more than once in one document.



Walter Underwood  schreef:Just double-checking my 
understanding of omitNorms.

For very short text fields like personal names or titles, length normalization 
can give odd results. For example, we might want these two to score the same 
for the query "Cinderella".

* Cinderella
* Cinderella (Diamond Edition) (Blu-ray + DVD + Digital Copy) (Widescreen)

And these two for the query "chuck":

* Chuck House
* Check E. Cheese

I think that omitNorm=true on those fields will give that behavior. Is that the 
right approach?

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Block until replication finishes

2014-04-01 Thread Mikhail Khludnev
On Tue, Apr 1, 2014 at 5:02 PM, Fermin Silva  wrote:

> Sorry but I have no clue about how to contribute with code. Will check that
> but if someone can point me to the right direction it would be nice.
>

You are welcome http://wiki.apache.org/solr/HowToContribute
Btw, cool finding re wait param! I didn't know it.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Re: tf and very short text fields

2014-04-01 Thread Markus Jelsma
Also, if i remember correctly, k1 set to zero for bm25 automatically omits 
norms in the calculation. So thats easy to play with without reindexing.


Markus Jelsma  schreef:Yes, override 
tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero 
in your schema.


Walter Underwood  schreef:And here is another 
peculiarity of short text fields.

The movie "New York, New York" should not be twice as relevant for the query 
"new york". Is there a way to use a binary term frequency rather than a count?

wunder
--
Walter Underwood
wun...@wunderwood.org





Re: Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Markus Jelsma
You may want to increase reclaimdeletesweight for tieredmergepolicy from 2 to 3 
or 4. By default it may keep too much deleted or updated docs in the index. 
This can increase index size by 50%!! Dmitry Kan  
schreef:Elisabeth,

Yes, I believe you are right in that the deletes are part of the optimize
process. If you delete often, you may consider (if not already) the
TieredMergePolicy, which is suited for this scenario. Check out this
relevant discussion I had with Lucene committers:
https://twitter.com/DmitryKan/status/399820408444051456

HTH,

Dmitry


On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit  wrote:

> Thanks a lot for your answers!
>
> Shawn. Our GC configuration has far less parameters defined, so we'll check
> this out.
>
> Dimitry, about the expungeDeletes option, we'll add that in the delete
> process. But from what I read, this is done in the optimize process (cf.
>
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> ).
> Or maybe not?
>
> Thanks again,
> Elisabeth
>
>
> 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
>
> > Hi,
> >
> > We have noticed something like this as well, but with older versions of
> > solr, 3.4. In our setup we delete documents pretty often. Internally in
> > Lucene, when a document is client requested to be deleted, it is not
> > physically deleted, but only marked as "deleted". Our original
> optimization
> > assumption was such that the "deleted" documents would get physically
> > removed on each optimize command issued. We started to suspect it wasn't
> > always true as the shards (especially relatively large shards) became
> > slower over time. So we found out about the expungeDeletes option, which
> > purges the "deleted" docs and is by default false. We have set it to
> true.
> > If your solr update lifecycle includes frequent deletes, try this out.
> >
> > This of course does not override working towards finding better
> > GCparameters.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> >
> >
> > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > elisaelisael...@gmail.com
> > > wrote:
> >
> > > Hello,
> > >
> > > We are currently using solr 4.2.1. Our index is updated on a daily
> basis.
> > > After noticing solr query time has increased (two times the initial
> size)
> > > without any change in index size or in solr configuration, we tried an
> > > optimize on the index but it didn't fix our problem. We checked the
> > garbage
> > > collector, but everything seemed fine. What did in fact fix our problem
> > was
> > > to delete all documents and reindex from scratch.
> > >
> > > It looks like over time our index gets "corrupted" and optimize doesn't
> > fix
> > > it. Does anyone have a clue how to investigate further this situation?
> > >
> > >
> > > Elisabeth
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


Re: tf and very short text fields

2014-04-01 Thread Walter Underwood
Thanks! We'll try that out and report back. I keep forgetting that I want to 
try BM25, so this is a good excuse.

wunder

On Apr 1, 2014, at 12:30 PM, Markus Jelsma  wrote:

> Also, if i remember correctly, k1 set to zero for bm25 automatically omits 
> norms in the calculation. So thats easy to play with without reindexing.
> 
> 
> Markus Jelsma  schreef:Yes, override 
> tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to 
> zero in your schema.
> 
> 
> Walter Underwood  schreef:And here is another 
> peculiarity of short text fields.
> 
> The movie "New York, New York" should not be twice as relevant for the query 
> "new york". Is there a way to use a binary term frequency rather than a count?
> 
> wunder
> --
> Walter Underwood
> wun...@wunderwood.org
> 
> 
> 

--
Walter Underwood
wun...@wunderwood.org





Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-04-01 Thread Mikhail Khludnev
Hello Salman,
Let's me drop few thoughts on
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E

There two aspects of this question:
1. dealing with long running processing (thread divergence actions
http://docs.oracle.com/javase/specs/jls/se5.0/html/memory.html#65310) and
2. an actual time checking.
"terminating" or "aborting" thread (2.) are just a way to tracking time
externally, and send interrupt() which the thread should react on, which
they don't do now, and we returning to the core issue (1.)

Solr's time allowed is to the proper way to handle this things, the only
problem is that expect that the only core search is long running, but in
your case rewriting MultiTermQuery-s takes a huge time.
Let's consider this problem. First of all MultiTermQuery.rewrite() is the
nearly design issue, after heavy rewrite occurs, it's thrown away, after
search is done. I think the most straightforward way is to address this
issue by caching these expensive queries. Solr does it well
http://wiki.apache.org/solr/CommonQueryParameters#fq However, only for
http://en.wikipedia.org/wiki/Conjunctive_normal_form like queries, there is
a workaround allows to cache disjunction legs see
http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html
If you still want to run expensively rewritten queries you need to
implement timeout check (similar to TimeLimitingCollector) for TermsEnum
returned from MultiTermQuery.getTermsEnum(), wrapping an actual TermsEnums
is the good way, to apply queries injecting time limiting wrapper
TermsEnum, you might consider override methods like
SolrQueryParserBase.newWildcardQuery(Term) or post process the query three
after parsing.



On Mon, Mar 31, 2014 at 2:24 PM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> Anyone?
>
>
> On Wed, Mar 26, 2014 at 7:55 PM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
> > With reference to this thread<
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200903.mbox/%3c856ac15f0903272054q2dbdbd19kea3c5ba9e105b...@mail.gmail.com%3E>I
> wanted to know if there was any response to that or if Chris Harris
> > himself can comment on what he ended up doing, that would be great!
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
> >
>
>
> --
> Regards,
>
> Salman Akram
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-01 Thread solr-user
Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat "fort" and "ft" as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index "fort saint john", does it work for you?  i.e.
does it return results if you search for "ft st john" and "ft saint john"
and "fort st john"?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-01 Thread alxsss
It seems to me that, you are missing this line  

  

under
 

Alex.

 

 

-Original Message-
From: solr-user 
To: solr-user 
Sent: Tue, Apr 1, 2014 5:01 pm
Subject: Re: how do I get search for "fort st john" to match "ft saint john"


Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat "fort" and "ft" as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index "fort saint john", does it work for you?  i.e.
does it return results if you search for "ft st john" and "ft saint john"
and "fort st john"?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: eDismax parser and the mm parameter

2014-04-01 Thread William Bell
Fuzzy is provided use ~


On Mon, Mar 31, 2014 at 11:04 PM, S.L  wrote:

> Jack ,
>
> Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
> the mm parameter from my queries, however for the fuzzy phrase queries , I
> am not sure how I would be able to leverage the Complex Query Parser there
> is absolutely nothing out there that gives me any idea as to how to do that
> .
>
> Why is fuzzy phrase search not provided by Solr OOB ? I am surprised
>
> Thanks.
>
>
> On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky  >wrote:
>
> > The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> > (the default) and ignore the mm parameter. Give pf the highest boost, and
> > boost pf3 higher than pf2.
> >
> > You could try using the complex phrase query parser for the third case.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: S.L
> > Sent: Monday, March 31, 2014 12:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: eDismax parser and the mm parameter
> >
> > Thanks Jack , my use cases are as follows.
> >
> >
> >   1. Search for "Ginseng" everything related to ginseng should show up.
> >   2. Search For "White Siberian Ginseng" results with the whole phrase
> >   show up first followed by 2 words from the phrase followed by a single
> > word
> >   in the phrase
> >   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
> >   documents with White Siberian Ginseng Should show up , this looks like
> > the
> >   most complicated of all as Solr does not support fuzzy phrase searches
> .
> > (I
> >   have no solution for this yet).
> >
> > Thanks again!
> >
> >
> > On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky <
> j...@basetechnology.com>
> > wrote:
> >
> >  The mm parameter is really only relevant when the default operator is OR
> >> or explicit OR operators are used.
> >>
> >> Again: Please provide your use case examples and your expectations for
> >> each use case. It really doesn't make a lot of sense to prematurely
> focus
> >> on a solution when you haven't clearly defined your use cases.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: S.L
> >> Sent: Sunday, March 30, 2014 9:13 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: eDismax parser and the mm parameter
> >>
> >> Jack,
> >>
> >> I mis-stated the problem , I am not using the OR operator as default
> >> now(now that I think about it it does not make sense to use the default
> >> operator OR along with the mm parameter) , the reason I want to use pf
> and
> >> mm in conjunction is because of my understanding of the edismax parser
> and
> >> I have not looked into pf2 and pf3 parameters yet.
> >>
> >> I will state my understanding here below.
> >>
> >> Pf -  Is used to boost the result score if the complete phrase matches.
> >> mm <(less than) search term length would help limit the query results
>  to
> >> a
> >> certain number of better matches.
> >>
> >> With that being said would it make sense to have dynamic mm (set to the
> >> length of search term - 1)?
> >>
> >> I also have a question around using a fuzzy search along with eDismax
> >> parser , but I will ask that in a seperate post once I go thru that
> aspect
> >> of eDismax parser.
> >>
> >> Thanks again !
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky <
> j...@basetechnology.com>
> >> wrote:
> >>
> >>  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
> >>
> >>> will be dwarfed.
> >>>
> >>> The general goal is to assure that the top documents really are the
> best,
> >>> not to necessarily limit the total document count. Focusing on the
> latter
> >>> could be a real waste of time.
> >>>
> >>> It's still not clear why or how you need or want to use OR as the
> default
> >>> operator - you still haven't given us a use case for that.
> >>>
> >>> To repeat: Give us a full set of use cases before taking this XY
> Problem
> >>> approach of pursuing a solution before the problem is understood.
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -Original Message- From: S.L
> >>> Sent: Sunday, March 30, 2014 6:14 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: eDismax parser and the mm parameter
> >>>
> >>> Jacks Thanks Again,
> >>>
> >>> I am searching  Chinese medicine  documents , as the example I gave
> >>> earlier
> >>> a user can search for "Ginseng" or Siberian Ginseng or Red Siberian
> >>> Ginseng
> >>> , I certainly want to use pf parameter (which is not driven by mm
> >>> parameter) , however for giving higher score to documents that have
> more
> >>> of
> >>> the terms I want to use edismax now if I give a mm of 3 and the search
> >>> term
> >>> is of only length 1 (like "Ginseng") what does edisMax do ?
> >>>
> >>>
> >>> On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky <
> j...@basetechnology.com
> >>> >
> >>> wrote:
> >>>
> >>>  It still depends on your objective - which you haven't told us yet.
> Show
> >>>
> >>>  us some use cases

The word "no" in a query

2014-04-01 Thread Bob Laferriere

I have built an commerce search engine. I am struggling with the word “no” in 
queries. We have products that are “No Smoking Sign.” When the query is 
“Smoking AND Sign” the product is found. If I query as “No AND Sign” I get no 
results? I do not have no as a stop word. Any ideas why I would get zero 
results back?

Regards,

Bob

Re: Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Dmitry Kan
Thanks, Markus, that is useful.
I'm guessing the higher the weight, the longer the op takes?


On Tue, Apr 1, 2014 at 10:39 PM, Markus Jelsma
wrote:

> You may want to increase reclaimdeletesweight for tieredmergepolicy from 2
> to 3 or 4. By default it may keep too much deleted or updated docs in the
> index. This can increase index size by 50%!! Dmitry Kan <
> solrexp...@gmail.com> schreef:Elisabeth,
>
> Yes, I believe you are right in that the deletes are part of the optimize
> process. If you delete often, you may consider (if not already) the
> TieredMergePolicy, which is suited for this scenario. Check out this
> relevant discussion I had with Lucene committers:
> https://twitter.com/DmitryKan/status/399820408444051456
>
> HTH,
>
> Dmitry
>
>
> On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Thanks a lot for your answers!
> >
> > Shawn. Our GC configuration has far less parameters defined, so we'll
> check
> > this out.
> >
> > Dimitry, about the expungeDeletes option, we'll add that in the delete
> > process. But from what I read, this is done in the optimize process (cf.
> >
> >
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> > ).
> > Or maybe not?
> >
> > Thanks again,
> > Elisabeth
> >
> >
> > 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
> >
> > > Hi,
> > >
> > > We have noticed something like this as well, but with older versions of
> > > solr, 3.4. In our setup we delete documents pretty often. Internally in
> > > Lucene, when a document is client requested to be deleted, it is not
> > > physically deleted, but only marked as "deleted". Our original
> > optimization
> > > assumption was such that the "deleted" documents would get physically
> > > removed on each optimize command issued. We started to suspect it
> wasn't
> > > always true as the shards (especially relatively large shards) became
> > > slower over time. So we found out about the expungeDeletes option,
> which
> > > purges the "deleted" docs and is by default false. We have set it to
> > true.
> > > If your solr update lifecycle includes frequent deletes, try this out.
> > >
> > > This of course does not override working towards finding better
> > > GCparameters.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> > >
> > >
> > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > > elisaelisael...@gmail.com
> > > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We are currently using solr 4.2.1. Our index is updated on a daily
> > basis.
> > > > After noticing solr query time has increased (two times the initial
> > size)
> > > > without any change in index size or in solr configuration, we tried
> an
> > > > optimize on the index but it didn't fix our problem. We checked the
> > > garbage
> > > > collector, but everything seemed fine. What did in fact fix our
> problem
> > > was
> > > > to delete all documents and reindex from scratch.
> > > >
> > > > It looks like over time our index gets "corrupted" and optimize
> doesn't
> > > fix
> > > > it. Does anyone have a clue how to investigate further this
> situation?
> > > >
> > > >
> > > > Elisabeth
> > > >
> > >
> > >
> > >
> > > --
> > > Dmitry
> > > Blog: http://dmitrykan.blogspot.com
> > > Twitter: http://twitter.com/dmitrykan
> > >
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan