Re: How to add a map of key/value pairs into a solr schema?

2014-04-02 Thread Silvia Suárez
Dear Jack

I'm using SolrJ to access and query the values in the solr collection,

For example,

I have a collection in solr in which I are updating the c_perfil
multivalued field, using this code:

SolrInputDocument sdoc = new SolrInputDocument();
sdoc.addField("c_noticia",doc.getFieldValue("c_noticia").toString());
Map fieldModifier = new HashMap(1);
fieldModifier.put("add",perfil);
sdoc.addField("c_perfil", fieldModifier);  // add the map as the field value
server3.add( sdoc );  // send it to the solr server
server3.commit();

The result is:

 
 2252
 3789
 3790
 3794
   

And it is working ok,


In this sense, Is it possible to update another field type, like a map
(key/value) using SolrJ?:

For example some thing like this:


 2252 / 23
 3789 / 54
 3790 / 21
 3794 / 12
   

Thanks a lot in advance,

Silvia.




Silvia Suárez Barón
I+D+I

972 989 470  / s...@anpro21.com /   

  
  


*Tecnologías y SaaS para el análisis de marcas comerciales.*


Nota:
Usted ha recibido este mensaje al estar en la libreta de direcciones del
remitente, en los archivos de la empresa o mediante el sistema de
"responder" al ser usted la persona que contactó por este medio con el
remitente. En caso de no querer recibir ningún email mas del remitente o de
cualquier miembro de la organización a la que pertenece, por favor,
responda a este email solicitando la baja de su dirección en nuestros
archivos.

Advertencia legal:
Este mensaje y, en su caso, los ficheros anexos son confidenciales,
especialmente en lo que respecta a los datos personales, y se dirigen
exclusivamente al destinatario referenciado. Si usted no lo es y lo ha
recibido por error o tiene conocimiento del mismo por cualquier motivo, le
rogamos que nos lo comunique por este medio y proceda a destruirlo o
borrarlo, y que en todo caso se abstenga de utilizar, reproducir, alterar,
archivar o comunicar a terceros el presente mensaje y ficheros anexos, todo
ello bajo pena de incurrir en responsabilidades legales.


2014-04-01 18:35 GMT+02:00 Jack Krupansky :

> Not directly. The various workarounds depend on how you intend to access
> and query the values. What are your use cases?
>
> -- Jack Krupansky
>
> -Original Message- From: Silvia Suárez
> Sent: Tuesday, April 1, 2014 12:29 PM
> To: solr-user@lucene.apache.org
> Subject: How to add a map of key/value pairs into a solr schema?
>
> Dear all,
>
> I'm trying to add a map of key/value pairs into the solr schema, and I just
> wordering if it is possible.
>
> For instance:
>
> This is my schema.xml :
>
>  required="true" multiValued="false" />
>  multiValued="false"/>
>  multiValued="true"/>
>  multiValued="false"/>
>  multiValued="true" />
>
>
> Is it possible to define a type= map (see the example above in the schema)
> into the solr xchema?, for example something like this:
>
> map: 2252 / 23
> 3789 / 12
> 3790 / 21
> 3794 / 19
>
> And get a result like this:
>
> 
>62906367
>
>  2252
>  3789
>  3790
>  3794
>
>  :
>  :
>  
>  2252 / 23
>  3789 / 54
>  3790 / 21
>  3794 / 12
>
> 
>
> I mean, is it possible introduce a map into one document?
>
> Thanks in advance for some help,
>
> Silvia.
>


Re: Product index schema for solr

2014-04-02 Thread Ajay Patel


As per your suggestion my final schema will be like
{
id:
...
...
[PRODUCT RELATED DATAS]
...
...
...
min_qty: 1
max_qty: 50
price: 4
}


[OTHER SAME LIKE ABOVE DATA]



now i want to create range facet field by combing min_qty and max_qty.

i hope you have understood what i want to say :).
thanks a lot in adavance :)


Thanks & Regards
Ajay Patel.



On Mon, Mar 31, 2014 at 8:42 AM, Ajay Patel  
wrote:

On Monday 31 March 2014 06:07 PM, Erick Erickson wrote:

What do you mean by "generalized range facet"? How would
they differ from standard range faceting? Details are important...

Best.
Erick

On Mon, Mar 31, 2014 at 7:44 AM, Ajay Patel 
 wrote:


Hi Erick
Thank for the reply :). your solution help me to denormalize my 
data. now i
have one another question can i create a generalize range facet 
according to

min_qty and max_qty?

Thanks & Regards
Ajay Patel.


On Saturday 29 March 2014 08:54 PM, Erick Erickson wrote:

The usual approach is to de-normalize the tables, so you'd store 
docs like

(all your product data) min_qty, max_qty, price_per_qty

So the above example would have 4 documents, then it all "just works"

You have to insure that the id () is different for each, and
probably store the product ID in a field other than "id" for this 
reason.


Best,
Erick

On Fri, Mar 28, 2014 at 10:27 AM, Ajay Patel 
wrote:

Hi Solr user & developers.

i am new in the world of solr search engine. i have a complex product
database structure in postgres.

Product has many product_quantity_price attrbutes in range

For e.g Product iD 1 price range is stored in product_quantity_price
table in following manner.

min_qty max_qty price_per_qty
1504
51  100  3.5
1011503
151200  2.5

the range is not fixed for any product it can be different for 
different

product.

now my question is that how can i save this data in solr in optimized
way so that i can create facets on qty and prices.

Thanks in advance.
Ajay Patel.
















Velocity template examples and hardcoded contextPath

2014-04-02 Thread Thomas Pii
The current velocity template examples in the 4.6.1 distribution have a hard
coded context path for the solr web application:
#macro(url_root)/solr#end
in VM_global_library.vm hardcodes it to /solr

I would like to change this to determine the context path at run time, so
the templates do not require modifications if deployed to a different
context path.

Does anyone have any experience with this?

I have found LinkTool in Velocity which has the method getContextPath(), but
I am unsure if it can be used and how to use it if so.
I was thinking somethink like:
#macro(url_root)$link.contextPath#end

So far my attempts have failed and I am unsure how to access it and the
right syntax for it.

Whatever i put in the url_root macro just ends up as strings in the
generated HTML:
  
  
  
  

I appreciate any pointer you can give me.
Regards
Thomas



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Velocity-template-examples-and-hardcoded-contextPath-tp4128545.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: transaction log size

2014-04-02 Thread Gurfan
Thanks Shawn for the quick reply.

We are using Solr Cloud version 4.6.1 

Usually we see higher transaction log on replica. Leader`s tlog size is in
KB`s. We also tried keeping the hard commit(autoCommit) as 20 Sec and
autoSoftCommit as 30 Sec.

We written a script to monitor the disk usage of tlog directory in 1 Min
interval, also noticed that the logs are purge at a particular time. For
instance: Tlog starts with ~4MB and it increases at some point i.e.
20MB,50MB,220MB,600MB again it reduces with ~10MB.

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

How can we reduce Tlog size at its lowest, so that our system restart up
time will less.

Thanks,
--Gurfan  






--
View this message in context: 
http://lucene.472066.n3.nabble.com/transaction-log-size-tp4128354p4128547.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: solr 4.2.1 index gets slower over time

2014-04-02 Thread elisabeth benoit
This sounds interesting, I'll check this out.

Thanks!
Elisabeth


2014-04-02 8:54 GMT+02:00 Dmitry Kan :

> Thanks, Markus, that is useful.
> I'm guessing the higher the weight, the longer the op takes?
>
>
> On Tue, Apr 1, 2014 at 10:39 PM, Markus Jelsma
> wrote:
>
> > You may want to increase reclaimdeletesweight for tieredmergepolicy from
> 2
> > to 3 or 4. By default it may keep too much deleted or updated docs in the
> > index. This can increase index size by 50%!! Dmitry Kan <
> > solrexp...@gmail.com> schreef:Elisabeth,
> >
> > Yes, I believe you are right in that the deletes are part of the optimize
> > process. If you delete often, you may consider (if not already) the
> > TieredMergePolicy, which is suited for this scenario. Check out this
> > relevant discussion I had with Lucene committers:
> > https://twitter.com/DmitryKan/status/399820408444051456
> >
> > HTH,
> >
> > Dmitry
> >
> >
> > On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit <
> > elisaelisael...@gmail.com
> > > wrote:
> >
> > > Thanks a lot for your answers!
> > >
> > > Shawn. Our GC configuration has far less parameters defined, so we'll
> > check
> > > this out.
> > >
> > > Dimitry, about the expungeDeletes option, we'll add that in the delete
> > > process. But from what I read, this is done in the optimize process
> (cf.
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> > > ).
> > > Or maybe not?
> > >
> > > Thanks again,
> > > Elisabeth
> > >
> > >
> > > 2014-04-01 7:52 GMT+02:00 Dmitry Kan :
> > >
> > > > Hi,
> > > >
> > > > We have noticed something like this as well, but with older versions
> of
> > > > solr, 3.4. In our setup we delete documents pretty often. Internally
> in
> > > > Lucene, when a document is client requested to be deleted, it is not
> > > > physically deleted, but only marked as "deleted". Our original
> > > optimization
> > > > assumption was such that the "deleted" documents would get physically
> > > > removed on each optimize command issued. We started to suspect it
> > wasn't
> > > > always true as the shards (especially relatively large shards) became
> > > > slower over time. So we found out about the expungeDeletes option,
> > which
> > > > purges the "deleted" docs and is by default false. We have set it to
> > > true.
> > > > If your solr update lifecycle includes frequent deletes, try this
> out.
> > > >
> > > > This of course does not override working towards finding better
> > > > GCparameters.
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> > > >
> > > >
> > > > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > > > elisaelisael...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We are currently using solr 4.2.1. Our index is updated on a daily
> > > basis.
> > > > > After noticing solr query time has increased (two times the initial
> > > size)
> > > > > without any change in index size or in solr configuration, we tried
> > an
> > > > > optimize on the index but it didn't fix our problem. We checked the
> > > > garbage
> > > > > collector, but everything seemed fine. What did in fact fix our
> > problem
> > > > was
> > > > > to delete all documents and reindex from scratch.
> > > > >
> > > > > It looks like over time our index gets "corrupted" and optimize
> > doesn't
> > > > fix
> > > > > it. Does anyone have a clue how to investigate further this
> > situation?
> > > > >
> > > > >
> > > > > Elisabeth
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Dmitry
> > > > Blog: http://dmitrykan.blogspot.com
> > > > Twitter: http://twitter.com/dmitrykan
> > > >
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>


Issue with solr searching : words with "-" not able to search

2014-04-02 Thread Priti Solanki
Hello friends,

I have got one issue

I am trying to searching "X-Ray Machine"

Now Solr is returning multiple rows even if I am doing a exact search. [ On
solr server directly]

Secondly, I am using PHP client to talk to solr but with some reason  I
can't search with "X-Ray Machine". Solr response get breaks.

Someone suggested (X AND Ray) Machine in Advance search but this doesn't
sound very feasible to me.

As a User I would like to search with X-Ray Machine OR xray machine..

Can some guide on this please.

Regards,
Priti Solanki


Re: Issue with solr searching : words with "-" not able to search

2014-04-02 Thread Alexandre Rafalovitch
What's your field type definition where your X-Ray string is stored?

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Wed, Apr 2, 2014 at 3:19 PM, Priti Solanki  wrote:
> Hello friends,
>
> I have got one issue
>
> I am trying to searching "X-Ray Machine"
>
> Now Solr is returning multiple rows even if I am doing a exact search. [ On
> solr server directly]
>
> Secondly, I am using PHP client to talk to solr but with some reason  I
> can't search with "X-Ray Machine". Solr response get breaks.
>
> Someone suggested (X AND Ray) Machine in Advance search but this doesn't
> sound very feasible to me.
>
> As a User I would like to search with X-Ray Machine OR xray machine..
>
> Can some guide on this please.
>
> Regards,
> Priti Solanki


Re: Block Join Parent Query across children docs

2014-04-02 Thread mertens
Hi Mikhail,

Thanks for your response. Here is an example of what I'm trying to do. If I
had the following documents:


  10
  parent
  User1
  
11
item1, item6
  
  
12
item2, item7
  
  
13
item3, item8
  


  20
  parent
  user2
  
21
item1, item6
  
  
22
item2, item7
  
  
23
item8
  


I would like to do a search for users with item1 and item2 and not item3,
and that query should only return user2. I have tried this with a block
join query with solr 4.6.1 and it does not work the way I need it to. If
you have any ideas let me know.

Thanks,
Luke


On Sat, Mar 29, 2014 at 1:46 PM, Mikhail Khludnev [via Lucene] <
ml-node+s472066n4127896...@n3.nabble.com> wrote:

> Hello Luke,
>
> If I get you right, you need to combine parent (block join) queries e.g
> users who have a record with item1 AND users who have a record with item2.
>
> Does it make sense? If it does, do you need to figure out a syntax?
> 28.03.2014 14:19 пользователь "mertens" <[hidden 
> email]>
> написал:
>
> > Hello Solr Users,
> >
> > In my system I have multiple records belonging to users, and I need to
> > perform a query to find users who have records that meet the criteria of
> > that query. For example, if my record has the field "search" and I query
> > for
> > search:((item1 AND item2) NOT item3), I want to find all users that have
> > one
> > or more records with item1 and one or more records with item2 but no
> > records
> > containing item3.
> >
> > I have investigated the block join parent query which comes close to the
> > functionality that I need, but it appears to apply the entire query to
> each
> > individual child document, rather than across all child documents.
> >
> > At the moment the only solutions I can think of are to combine all the
> user
> > records into one giant document for each user or do some sort of OR
> query
> > to
> > get all documents with partial matches for each user and then manually
> > verify that my result document set satisfies my criteria. Neither of
> these
> > solutions sounds very attractive to me. Does anyone else have any advice
> or
> > recommendations for this scenario?
> >
> > Thanks,
> > Luke
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Block-Join-Parent-Query-across-children-docs-tp4127637.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Block-Join-Parent-Query-across-children-docs-tp4127637p4127896.html
>  To unsubscribe from Block Join Parent Query across children docs, click
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Block-Join-Parent-Query-across-children-docs-tp4127637p4128554.html
Sent from the Solr - User mailing list archive at Nabble.com.

Suspicious Object.wait in UnInvertedField.getUnInvertedField

2014-04-02 Thread adfel70
While debugging a problem where 400 threads were waiting for a single lock we
traced the issue to the getUnInvertedField method. 

public static UnInvertedField getUnInvertedField(String field,
SolrIndexSearcher searcher) throws IOException {
SolrCache cache = searcher.getFieldValueCache();
if (cache == null) {
  return new UnInvertedField(field, searcher);
}
UnInvertedField uif = null;
Boolean doWait = false;
synchronized (cache) {
  uif = cache.get(field);
  if (uif == null) {
cache.put(field, uifPlaceholder); // This thread will load this
field, don't let other threads try.
  } else {
if (uif.isPlaceholder == false) {
  return uif;
}
doWait = true; // Someone else has put the place holder in, wait for
that to complete.
  }
}
while (doWait) {
  try {
synchronized (cache) {
  uif = cache.get(field); // Should at least return the placeholder,
NPE if not is OK.
  if (uif.isPlaceholder == false) { // OK, another thread put this
in the cache we should be good.
return uif;
  }
*  cache.wait();*
}
  } catch (InterruptedException e) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Thread interrupted in getUninvertedField.");
  }
}

uif = new UnInvertedField(field, searcher);
synchronized (cache) {
  cache.put(field, uif); // Note, this cleverly replaces the
placeholder.
  *cache.notifyAll();*
}

return uif;
  }

It seems that the code is waiting on the same object it is synchronized on,
thus the notifyAll call may never happen since is requires re-obtaining the
lock...

Am i missing something here? or is this a real bug?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suspicious-Object-wait-in-UnInvertedField-getUnInvertedField-tp4128555.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with solr searching : words with "-" not able to search

2014-04-02 Thread Ajay Patel

Can u please share your schema.xml used for this solr instance?

On Wednesday 02 April 2014 01:49 PM, Priti Solanki wrote:

Hello friends,

I have got one issue

I am trying to searching "X-Ray Machine"

Now Solr is returning multiple rows even if I am doing a exact search. [ On
solr server directly]

Secondly, I am using PHP client to talk to solr but with some reason  I
can't search with "X-Ray Machine". Solr response get breaks.

Someone suggested (X AND Ray) Machine in Advance search but this doesn't
sound very feasible to me.

As a User I would like to search with X-Ray Machine OR xray machine..

Can some guide on this please.

Regards,
Priti Solanki





RE: Where to specify numShards when startup up a cloud setup

2014-04-02 Thread zzT
It seems that I've figured out a "configuration approach" to this issue.

I'm having the exact same issue and the only viable solutions found on the
net till now are
1) Pass -DnumShards=x when starting up Solr server
2) Use the Collections API as indicated by Shawn.

What I've noticed though - after making the call to /collections to create a
node solr.xml - is that a new  entry is added inside solr.xml with the
attribute "numShards". 

So, right now I'm configuring solr.xml with numShards attribute inside my
 nodes. This way I don't have to worry with annoying stuff you've
already mentioned e.g. waiting for Solr to start up etc. 

Of course same logic applies here, numShards param is meanigful only the
first time. Even if you change it at a later point the # of shards stays the
same.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Where-to-specify-numShards-when-startup-up-a-cloud-setup-tp4078473p4128566.html
Sent from the Solr - User mailing list archive at Nabble.com.


split existing indexes into shards

2014-04-02 Thread Gastone Penzo
Hello,
i have 2 shards with 2 replicas in 4 different node like this scheme:


server1
--
shard1 master

server2
-
shard 2 master

server 3

shard replicas of shard1

server 4
---
shard replicas of shard2



i have existing indexes only in shard 1 and i want to split them
automatically into shard2
without splitshard api, because it would create another shard in the same
server (shard1_1)
i only want to automatically move the indexes. is it possible?

-- 
*Gastone Penzo*


Return Solr docs in a specific order by list of ids

2014-04-02 Thread marotosg
Hi,

I have a use case where I  have a list of doc ids and I need to return
Documents from solr
in the same order as my list of ids.

For instance:
459,185,569,8,1,896

Is it possible to return docs is Solr following in the same order?

Regards,
Sergio



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Return-Solr-docs-in-a-specific-order-by-list-of-ids-tp4128570.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: sort by an attribute values sequence

2014-04-02 Thread santosh sidnal
Re-sending my e-mail. any pointers/ links for the issue will help me lot.

Thanks in advance.


On Tue, Apr 1, 2014 at 4:25 PM, santosh sidnal wrote:

> Hi All,
>
> We have a specific requirement of sorting the products as per a specific
> attribute value sequence. Any pointer or source of info would help us.
>
> Example of the scenario;
>
> Let's say for search result i want to sort results based on a attribute
> producttype. Where producttype has following values, A, B, C, D.
>
> so while in solr query i can give either producttype asc, producttype desc.
>
> But I want get result in a specific way by saying first give me All
> results of values 'C' then B, A, D.
>
>
> --
> Regards,
> Santosh Sidnal
>
>


-- 
Regards,
Santosh Sidnal


Re: The word "no" in a query

2014-04-02 Thread François Schiettecatte
Have you looked at the debugging output?

http://wiki.apache.org/solr/CommonQueryParameters#Debugging

François

On Apr 2, 2014, at 1:37 AM, Bob Laferriere  wrote:

> 
> I have built an commerce search engine. I am struggling with the word “no” in 
> queries. We have products that are “No Smoking Sign.” When the query is 
> “Smoking AND Sign” the product is found. If I query as “No AND Sign” I get no 
> results? I do not have no as a stop word. Any ideas why I would get zero 
> results back?
> 
> Regards,
> 
> Bob



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Return Solr docs in a specific order by list of ids

2014-04-02 Thread Alexandre Rafalovitch
Most anything us possible but maybe not out of the box.

Custom post filter ?
On 02/04/2014 5:47 pm, "marotosg"  wrote:

> Hi,
>
> I have a use case where I  have a list of doc ids and I need to return
> Documents from solr
> in the same order as my list of ids.
>
> For instance:
> 459,185,569,8,1,896
>
> Is it possible to return docs is Solr following in the same order?
>
> Regards,
> Sergio
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Return-Solr-docs-in-a-specific-order-by-list-of-ids-tp4128570.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: eDismax parser and the mm parameter

2014-04-02 Thread simpleliving...@gmail.com
It only works for a single word search term and not multiple word search term.

Sent from my HTC

- Reply message -
From: "William Bell" 
To: "solr-user@lucene.apache.org" 
Subject: eDismax parser and the mm parameter
Date: Wed, Apr 2, 2014 12:03 AM

Fuzzy is provided use ~


On Mon, Mar 31, 2014 at 11:04 PM, S.L  wrote:

> Jack ,
>
> Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
> the mm parameter from my queries, however for the fuzzy phrase queries , I
> am not sure how I would be able to leverage the Complex Query Parser there
> is absolutely nothing out there that gives me any idea as to how to do that
> .
>
> Why is fuzzy phrase search not provided by Solr OOB ? I am surprised
>
> Thanks.
>
>
> On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky  >wrote:
>
> > The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> > (the default) and ignore the mm parameter. Give pf the highest boost, and
> > boost pf3 higher than pf2.
> >
> > You could try using the complex phrase query parser for the third case.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: S.L
> > Sent: Monday, March 31, 2014 12:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: eDismax parser and the mm parameter
> >
> > Thanks Jack , my use cases are as follows.
> >
> >
> >   1. Search for "Ginseng" everything related to ginseng should show up.
> >   2. Search For "White Siberian Ginseng" results with the whole phrase
> >   show up first followed by 2 words from the phrase followed by a single
> > word
> >   in the phrase
> >   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
> >   documents with White Siberian Ginseng Should show up , this looks like
> > the
> >   most complicated of all as Solr does not support fuzzy phrase searches
> .
> > (I
> >   have no solution for this yet).
> >
> > Thanks again!
> >
> >
> > On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky <
> j...@basetechnology.com>
> > wrote:
> >
> >  The mm parameter is really only relevant when the default operator is OR
> >> or explicit OR operators are used.
> >>
> >> Again: Please provide your use case examples and your expectations for
> >> each use case. It really doesn't make a lot of sense to prematurely
> focus
> >> on a solution when you haven't clearly defined your use cases.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: S.L
> >> Sent: Sunday, March 30, 2014 9:13 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: eDismax parser and the mm parameter
> >>
> >> Jack,
> >>
> >> I mis-stated the problem , I am not using the OR operator as default
> >> now(now that I think about it it does not make sense to use the default
> >> operator OR along with the mm parameter) , the reason I want to use pf
> and
> >> mm in conjunction is because of my understanding of the edismax parser
> and
> >> I have not looked into pf2 and pf3 parameters yet.
> >>
> >> I will state my understanding here below.
> >>
> >> Pf -  Is used to boost the result score if the complete phrase matches.
> >> mm <(less than) search term length would help limit the query results
>  to
> >> a
> >> certain number of better matches.
> >>
> >> With that being said would it make sense to have dynamic mm (set to the
> >> length of search term - 1)?
> >>
> >> I also have a question around using a fuzzy search along with eDismax
> >> parser , but I will ask that in a seperate post once I go thru that
> aspect
> >> of eDismax parser.
> >>
> >> Thanks again !
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky <
> j...@basetechnology.com>
> >> wrote:
> >>
> >>  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
> >>
> >>> will be dwarfed.
> >>>
> >>> The general goal is to assure that the top documents really are the
> best,
> >>> not to necessarily limit the total document count. Focusing on the
> latter
> >>> could be a real waste of time.
> >>>
> >>> It's still not clear why or how you need or want to use OR as the
> default
> >>> operator - you still haven't given us a use case for that.
> >>>
> >>> To repeat: Give us a full set of use cases before taking this XY
> Problem
> >>> approach of pursuing a solution before the problem is understood.
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -Original Message- From: S.L
> >>> Sent: Sunday, March 30, 2014 6:14 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: eDismax parser and the mm parameter
> >>>
> >>> Jacks Thanks Again,
> >>>
> >>> I am searching  Chinese medicine  documents , as the example I gave
> >>> earlier
> >>> a user can search for "Ginseng" or Siberian Ginseng or Red Siberian
> >>> Ginseng
> >>> , I certainly want to use pf parameter (which is not driven by mm
> >>> parameter) , however for giving higher score to documents that have
> more
> >>> of
> >>> the terms I want to use edismax now if I give a mm of 3 and the search
> >>> term
> >>> is of only length 1 (like "Ginseng

Flush buffer exceptions

2014-04-02 Thread ku3ia
Hi all!
I'm using Solr 4.6.0 and Jetty 8. Sometimes in jetty's logs are these errors
and warnings:

ERROR - 2014-03-27 17:11:15.022; org.apache.solr.common.SolrException;
null:org.eclipse.jetty.io.EofException
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at
org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:523)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:147)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
...
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:310)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:402)
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:853)
... 35 more
ERROR - 2014-03-27 17:11:15.022; org.apache.solr.common.SolrException;
null:org.eclipse.jetty.io.EofException
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at
org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:523)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:147)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:214)
at
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:207)
...
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:310)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:402)
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:853)
... 35 more
WARN  - 2014-03-27 17:11:15.022; org.eclipse.jetty.server.Response;
Committed before 500 {msg=Broken
pipe,trace=org.eclipse.jetty.io.EofException
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at
org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:523)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:147)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
...
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:310)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:402)
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:853)
... 35 more
,code=500}
WARN  - 2014-03-27 17:11:15.023; org.eclipse.jetty.servlet.ServletHandler;
/solr/collection1/select
java.lang.IllegalStateException: Committed
at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1144)
...

Does anyone have some ideas about this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Flush-buffer-exceptions-tp4128593.html
Sent from the Solr - User mailing list archive at Nabble.com.


Errors after upgrading to 4.6.1

2014-04-02 Thread Christopher Gross
I get both of these errors a few times in my tomcat (7.0.52) catalina.out
logfile:

2014-04-02 13:22:32,026 WARN  org.apache.solr.schema.FieldTypePluginLoader
- TokenFilterFactory is using deprecated LUCENE_33 emulation. You should at
some point declare and reindex to at least 4.0, because 3.x emulation is
deprecated and will be removed in 5.0

2014-04-02 13:22:32,138 WARN  org.apache.solr.schema.FieldTypePluginLoader
- TokenizerFactory is using deprecated LUCENE_33 emulation. You should at
some point declare and reindex to at least 4.0, because 3.x emulation is
deprecated and will be removed in 5.0

What should I be doing to fix them?  Is there a replacement for those
classes?  Do I just need to change the luceneMatchVersion to be LUCENE_461
or something?

Tried some google searches but they proved fruitless.

Thanks!

-- Chris


[ANNOUNCE] Apache Solr 4.7.1 released

2014-04-02 Thread Steve Rowe
April 2014, Apache Solr™ 4.7.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.7.1

Solr is the popular, blazing fast, open source NoSQL search platform from the 
Apache Lucene project. Its major features include powerful full-text search, 
hit highlighting, faceted search, dynamic clustering, database integration, 
rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly 
scalable, providing fault tolerant distributed search and indexing, and powers 
the search and navigation features of many of the world's largest internet 
sites.

Solr 4.7.1 is available for immediate download at:

http://lucene.apache.org/solr/mirrors-solr-latest-redir.html 

Solr 4.7.1 includes 28 bug fixes and one new configuration setting, as well as 
Lucene 4.7.1 and its bug fixes.

See the CHANGES.txt file included with the release for a full list of changes 
and further details.

Please report any feedback to the mailing lists 
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network for 
distributing releases. It is possible that the mirror you are using may not 
have replicated the release yet. If that is the case, please try another 
mirror. This also goes for Maven access.



Re: eDismax parser and the mm parameter

2014-04-02 Thread Ahmet Arslan
Hi SL,

Instead of fuzzy queries, can't you use spell checker? Generally Spell Checker 
(a.k.a did you mean) is a preferred tool for typos.

Ahmet

On Wednesday, April 2, 2014 4:13 PM, "simpleliving...@gmail.com" 
 wrote:

It only works for a single word search term and not multiple word search term.

Sent from my HTC

- Reply message -
From: "William Bell" 
To: "solr-user@lucene.apache.org" 
Subject: eDismax parser and the mm parameter
Date: Wed, Apr 2, 2014 12:03 AM

Fuzzy is provided use ~


On Mon, Mar 31, 2014 at 11:04 PM, S.L  wrote:

> Jack ,
>
> Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
> the mm parameter from my queries, however for the fuzzy phrase queries , I
> am not sure how I would be able to leverage the Complex Query Parser there
> is absolutely nothing out there that gives me any idea as to how to do that
> .
>
> Why is fuzzy phrase search not provided by Solr OOB ? I am surprised
>
> Thanks.
>
>
> On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky  >wrote:
>
> > The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> > (the default) and ignore the mm parameter. Give pf the highest boost, and
> > boost pf3 higher than pf2.
> >
> > You could try using the complex phrase query parser for the third case.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: S.L
> > Sent: Monday, March 31, 2014 12:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: eDismax parser and the mm parameter
> >
> > Thanks Jack , my use cases are as follows.
> >
> >
> >   1. Search for "Ginseng" everything related to ginseng should show up.
> >   2. Search For "White Siberian Ginseng" results with the whole phrase
> >   show up first followed by 2 words from the phrase followed by a single
> > word
> >   in the phrase
> >   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
> >   documents with White Siberian Ginseng Should show up , this looks like
> > the
> >   most complicated of all as Solr does not support fuzzy phrase searches
> .
> > (I
> >   have no solution for this yet).
> >
> > Thanks again!
> >
> >
> > On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky <
> j...@basetechnology.com>
> > wrote:
> >
> >  The mm parameter is really only relevant when the default operator is OR
> >> or explicit OR operators are used.
> >>
> >> Again: Please provide your use case examples and your expectations for
> >> each use case. It really doesn't make a lot of sense to prematurely
> focus
> >> on a solution when you haven't clearly defined your use cases.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: S.L
> >> Sent: Sunday, March 30, 2014 9:13 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: eDismax parser and the mm parameter
> >>
> >> Jack,
> >>
> >> I mis-stated the problem , I am not using the OR operator as default
> >> now(now that I think about it it does not make sense to use the default
> >> operator OR along with the mm parameter) , the reason I want to use pf
> and
> >> mm in conjunction is because of my understanding of the edismax parser
> and
> >> I have not looked into pf2 and pf3 parameters yet.
> >>
> >> I will state my understanding here below.
> >>
> >> Pf -  Is used to boost the result score if the complete phrase matches.
> >> mm <(less than) search term length would help limit the query results
>  to
> >> a
> >> certain number of better matches.
> >>
> >> With that being said would it make sense to have dynamic mm (set to the
> >> length of search term - 1)?
> >>
> >> I also have a question around using a fuzzy search along with eDismax
> >> parser , but I will ask that in a seperate post once I go thru that
> aspect
> >> of eDismax parser.
> >>
> >> Thanks again !
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky <
> j...@basetechnology.com>
> >> wrote:
> >>
> >>  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
> >>
> >>> will be dwarfed.
> >>>
> >>> The general goal is to assure that the top documents really are the
> best,
> >>> not to necessarily limit the total document count. Focusing on the
> latter
> >>> could be a real waste of time.
> >>>
> >>> It's still not clear why or how you need or want to use OR as the
> default
> >>> operator - you still haven't given us a use case for that.
> >>>
> >>> To repeat: Give us a full set of use cases before taking this XY
> Problem
> >>> approach of pursuing a solution before the problem is understood.
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -Original Message- From: S.L
> >>> Sent: Sunday, March 30, 2014 6:14 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: eDismax parser and the mm parameter
> >>>
> >>> Jacks Thanks Again,
> >>>
> >>> I am searching  Chinese medicine  documents , as the example I gave
> >>> earlier
> >>> a user can search for "Ginseng" or Siberian Ginseng or Red Siberian
> >>> Ginseng
> >>> , I certainly want to use pf parameter (which is not driven b

Re: Luke 4.7.0 released

2014-04-02 Thread Joshua P
Hi there! 

I'm recieving the following errors when trying to run luke-with-deps.jar

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
Exception in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the 
UncaughtExceptionHandler in thread "main"

Any ideas? 

On Monday, March 10, 2014 5:20:05 PM UTC-4, Dmitry Kan wrote:
>
> Hello!
>
> Luke 4.7.0 has been released. Download it here:
>
> https://github.com/DmitryKey/luke/releases/tag/4.7.0
>
> Release based on pull request of Petri Kivikangas (
> https://github.com/DmitryKey/luke/pull/2) Kiitos, Petri!
>
> Tested against the solr-4.7.0 index.
>
> 1. Upgraded maven plugins.
> 2. Added simple Windows launch script: In Windows, Luke can now be 
> launched easily by executing luke.bat. Script sets MaxPermSize to 512m 
> because Luke was found to crash on lower settings.
>
> Best regards,
>
> Dmitry Kan
>
> -- 
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>  


Re: sort by an attribute values sequence

2014-04-02 Thread Ahmet Arslan
Hi,

How many distinct producttype do you have?

May be 

q=C^5000 OR B^4000 OR A^3000 OR D&df=producttype

could work.

If you can came up with a function that takes maximum value when producttype=C 
… etc you can sort by function queries too.
http://wiki.apache.org/solr/FunctionQuery


Ahmet


On Wednesday, April 2, 2014 1:52 PM, santosh sidnal  
wrote:
Re-sending my e-mail. any pointers/ links for the issue will help me lot.

Thanks in advance.


On Tue, Apr 1, 2014 at 4:25 PM, santosh sidnal wrote:

> Hi All,
>
> We have a specific requirement of sorting the products as per a specific
> attribute value sequence. Any pointer or source of info would help us.
>
> Example of the scenario;
>
> Let's say for search result i want to sort results based on a attribute
> producttype. Where producttype has following values, A, B, C, D.
>
> so while in solr query i can give either producttype asc, producttype desc.
>
> But I want get result in a specific way by saying first give me All
> results of values 'C' then B, A, D.
>
>
> --
> Regards,
> Santosh Sidnal

>
>


-- 
Regards,
Santosh Sidnal



Re: The word "no" in a query

2014-04-02 Thread Ahmet Arslan
Hi Bob,

Your field type would be useful here. Can you copy-paste it?

Ahmet



On Wednesday, April 2, 2014 2:01 PM, François Schiettecatte 
 wrote:
Have you looked at the debugging output?

    http://wiki.apache.org/solr/CommonQueryParameters#Debugging

François


On Apr 2, 2014, at 1:37 AM, Bob Laferriere  wrote:

> 
> I have built an commerce search engine. I am struggling with the word “no” in 
> queries. We have products that are “No Smoking Sign.” When the query is 
> “Smoking AND Sign” the product is found. If I query as “No AND Sign” I get no 
> results? I do not have no as a stop word. Any ideas why I would get zero 
> results back?
> 
> Regards,
> 
> Bob



Re: transaction log size

2014-04-02 Thread Erick Erickson
On the surface, this doesn't make sense, I'd expect that the tlogs
would be roughly the same size on leaders and replicas. Or at least
show the same variance.

If you were to guess how much volume in terms of files being fired at
the index, how much would you expect in 30 seconds? And does it
approximate the size you're seeing in your tlogs (acutally 2x your
data transmission rate over 30 seconds).

Hard commits with openSearcher=false are actually pretty cheap
operations. About all they do is close the currently open segments and
truncate the tlog. What happens if you drop it to 10 seconds?

 Best,
Erick

On Wed, Apr 2, 2014 at 4:04 AM, Gurfan  wrote:
> Thanks Shawn for the quick reply.
>
> We are using Solr Cloud version 4.6.1
>
> Usually we see higher transaction log on replica. Leader`s tlog size is in
> KB`s. We also tried keeping the hard commit(autoCommit) as 20 Sec and
> autoSoftCommit as 30 Sec.
>
> We written a script to monitor the disk usage of tlog directory in 1 Min
> interval, also noticed that the logs are purge at a particular time. For
> instance: Tlog starts with ~4MB and it increases at some point i.e.
> 20MB,50MB,220MB,600MB again it reduces with ~10MB.
>
> http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> How can we reduce Tlog size at its lowest, so that our system restart up
> time will less.
>
> Thanks,
> --Gurfan
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/transaction-log-size-tp4128354p4128547.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spatial maxDistErr changes

2014-04-02 Thread David Smiley
Good question Steve,

You'll have to re-index right off.

~ David
p.s. Sorry I didn't reply sooner; I just switched jobs and reconfigured my
mailing list subscriptions



Steven Bower wrote
> If am only indexing point shapes and I want to change the maxDistErr from
> 0.09 (1m res) to 0.00045 will this "break" as in searches stop working
> or will search work but any performance gain won't be seen until all docs
> are reindexed? Or will I have to reindex right off?
> 
> thanks,
> 
> steve





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 Independent Lucene/Solr search consultant, 
http://www.linkedin.com/in/davidwsmiley
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-maxDistErr-changes-tp4124836p4128620.html
Sent from the Solr - User mailing list archive at Nabble.com.


Get number of documents in a new (not visible) Searcher

2014-04-02 Thread Oliver Schrenk
Hi,

We have a SolrCloud 4.7 cluster with five machines and index in a distributed 
fashion.

When finished adding and deleting documents, we want to commit programmaticly
 and switch to a new searcher. But before doing that we want to make a final 
 check that the number of documents have not changed dramatically.

How do I check the number of documents in the new but not yet open index?

Regards
Oliver

Get number of documents in a new (not visible) Searcher

2014-04-02 Thread Oliver Schrenk
Hi,

We have a SolrCloud 4.7 cluster with five machines and index in a distributed 
fashion.

When finished adding and deleting documents, we want to commit programmaticly
 and switch to a new searcher. But before doing that we want to make a final 
 check that the number of documents have not changed dramatically.

How do I check the number of documents in the new but not yet open index?

Regards
Oliver

Re: Luke 4.7.0 released

2014-04-02 Thread simon
Also seeing this on Mac OS X.

java version = Java(TM) SE Runtime Environment (build 1.7.0_51-b13)


On Wed, Apr 2, 2014 at 11:01 AM, Joshua P  wrote:

> Hi there!
>
> I'm recieving the following errors when trying to run luke-with-deps.jar
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
> Exception in thread "main"
> Exception: java.lang.OutOfMemoryError thrown from the
> UncaughtExceptionHandler in thread "main"
>
> Any ideas?
>
> On Monday, March 10, 2014 5:20:05 PM UTC-4, Dmitry Kan wrote:
>>
>> Hello!
>>
>> Luke 4.7.0 has been released. Download it here:
>>
>> https://github.com/DmitryKey/luke/releases/tag/4.7.0
>>
>> Release based on pull request of Petri Kivikangas (
>> https://github.com/DmitryKey/luke/pull/2) Kiitos, Petri!
>>
>> Tested against the solr-4.7.0 index.
>>
>> 1. Upgraded maven plugins.
>> 2. Added simple Windows launch script: In Windows, Luke can now be
>> launched easily by executing luke.bat. Script sets MaxPermSize to 512m
>> because Luke was found to crash on lower settings.
>>
>> Best regards,
>>
>> Dmitry Kan
>>
>> --
>> Blog: http://dmitrykan.blogspot.com
>> Twitter: http://twitter.com/dmitrykan
>>
>


Re: Return Solr docs in a specific order by list of ids

2014-04-02 Thread marotosg
I found an easy solution which is using the boosting

(PersonID:459)^0.6 OR (PersonID:185)^0.5 OR (PersonID:569)^0.4 OR
(PersonID:8)^0.3 OR (PersonID:1)^0.2 OR (PersonID:896) ^0.1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Return-Solr-docs-in-a-specific-order-by-list-of-ids-tp4128570p4128633.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread solr-user
Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", "former",
"fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search on
"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st john"
or ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread Jack Krupansky
Query by phrase is a core feature of tokenized text in Lucene and Solr, so 
there is no need to use a pattern token filter for that purpose. And yes, 
doing so pretty much breaks most token filters that would assume that the 
text is tokenized.


-- Jack Krupansky

-Original Message- 
From: solr-user

Sent: Wednesday, April 2, 2014 12:46 PM
To: solr-user@lucene.apache.org
Subject: Re: how do I get search for "fort st john" to match "ft saint john"

Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", "former",
"fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search on
"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st john"
or ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Postings Format for Span queries on big index

2014-04-02 Thread Gopal Agarwal
Does lucene 4.6 use Lucene41PostingsFormat for Postings.nextdoc() while
executing the span queries?

When I am debugging the lucene 4.6 test cases for span queries, it is
showing that for above nextdoc() call it is utilizing DirectPostingsFormat.

My requirement is to run multiple span queries like "cat dog"~2 on 2 TB of
index and I am worried about the performance as I have to collect all the
docs in results.

For better performance:
Is there a better postingsformat to choose from while using span queries in
solr 4.6 or solr4.7? Given that we have lot of formats to choose from.

Does having termVectors=true or termPositions=true and termOffsets=true
helps?
If yes then, should I think about what to use as TermVectorFormat?

Thanks,
Gopal


Analysis of Japanese characters

2014-04-02 Thread Shawn Heisey
My company is setting up a system for a customer from Japan.  We have an 
existing system that handles primarily English.


Here's my general text analysis chain:

http://apaste.info/xa5

After talking to the customer about problems they are encountering with 
search, we have determined that some of the problems are caused because 
ICUTokenizer splits on *any* character set change, including changes 
between different Japanase character sets.


Knowing the risk of this being an XY problem, here's my question: Can 
someone help me develop a rule file for the ICU Tokenizer that will 
*not* split when the character set changes from one of the japanese 
character sets to another japanese character set, but still split on 
other character set changes?


Thanks,
Shawn



Re: Analysis of Japanese characters

2014-04-02 Thread Tom Burton-West
Hi Shawn,

I'm not sure I understand the problem and why you need to solve it at the
ICUTokenizer level rather than the CJKBigramFilter
Can you perhaps give a few examples of the problem?

Have you looked at the flags for the CJKBigramfilter?
You can tell it to make bigrams of different Japanese character sets.  For
example the config given in the JavaDocs tells it to make bigrams across 3
of the different Japanese character sets.  (Is the issue related to Romaji?)

 



http://lucene.apache.org/core/4_7_1/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilterFactory.html

Tom


On Wed, Apr 2, 2014 at 1:19 PM, Shawn Heisey  wrote:

> My company is setting up a system for a customer from Japan.  We have an
> existing system that handles primarily English.
>
> Here's my general text analysis chain:
>
> http://apaste.info/xa5
>
> After talking to the customer about problems they are encountering with
> search, we have determined that some of the problems are caused because
> ICUTokenizer splits on *any* character set change, including changes
> between different Japanase character sets.
>
> Knowing the risk of this being an XY problem, here's my question: Can
> someone help me develop a rule file for the ICU Tokenizer that will *not*
> split when the character set changes from one of the japanese character
> sets to another japanese character set, but still split on other character
> set changes?
>
> Thanks,
> Shawn
>
>


Re: Analysis of Japanese characters

2014-04-02 Thread Shawn Heisey

On 4/2/2014 11:33 AM, Tom Burton-West wrote:

Hi Shawn,

I'm not sure I understand the problem and why you need to solve it at the
ICUTokenizer level rather than the CJKBigramFilter
Can you perhaps give a few examples of the problem?

Have you looked at the flags for the CJKBigramfilter?
You can tell it to make bigrams of different Japanese character sets.  For
example the config given in the JavaDocs tells it to make bigrams across 3
of the different Japanese character sets.  (Is the issue related to Romaji?)

  



http://lucene.apache.org/core/4_7_1/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilterFactory.html

Tom


On Wed, Apr 2, 2014 at 1:19 PM, Shawn Heisey  wrote:


My company is setting up a system for a customer from Japan.  We have an
existing system that handles primarily English.

Here's my general text analysis chain:

http://apaste.info/xa5

After talking to the customer about problems they are encountering with
search, we have determined that some of the problems are caused because
ICUTokenizer splits on *any* character set change, including changes
between different Japanase character sets.

Knowing the risk of this being an XY problem, here's my question: Can
someone help me develop a rule file for the ICU Tokenizer that will *not*
split when the character set changes from one of the japanese character
sets to another japanese character set, but still split on other character
set changes?


Because of what ICUTokenizer does, by the time it makes it to the bigram 
filter, they're already separate terms.


Simplifying to english, let's pretend that upper and lowercase letters 
are in different character sets.  Original term is abCD.  You expect 
that by the end of the analysis, you'll have ab bC CD.  With the 
ICUTokenizer, you end up with just ab CD.


The index side is more complex because of outputUnigrams.  We are still 
deciding whether we want to keep that parameter set, but that's a 
separate issue, one that we know how to resolve without help.


Thanks,
Shawn



PDF Indexing

2014-04-02 Thread Sujatha Arun
Hi,

I  am able to use TIKA and DIH to  Index a pdf as a single document.However
I need each page to be single document. Is there any inbuilt mechanism to
achieve the same or do I have to use pdfbox or any other tool achieve this?

Regards


Re: AND not as a boolean operator in Phrase

2014-04-02 Thread abhishek jain
Hi,
Ok thanks,
i want to search for phrase "A and B" with the *and *word sandwiched
between A and B. I dont want to work with and as a boolean operator when
within quotes.

I have and as a stop word and i dont want to reindex data.

What is my best bet.

thanks
abhishek jain


On Sun, Mar 30, 2014 at 2:33 AM, Bob Laferriere wrote:

> If you are using edismax you need to use AND. So A AND B will ignore the
> stop word and apply the Boolean operator. You can configure edismax to
> ignore Boolean stop words that are lowercase.
>
> Regards,
>
> Bob
>
> > On Mar 26, 2014, at 2:39 AM, abhishek jain 
> wrote:
> >
> > Hi Jack,
> > You are right, i am using 'and' as a stop word in both indexing and
> query,
> >
> > Should i use it only during  indexing?
> >
> > thanks
> >
> >
> >
> > On Tue, Mar 25, 2014 at 11:09 PM, Jack Krupansky <
> j...@basetechnology.com>wrote:
> >
> >> What does your field type analyzer look like?
> >>
> >> I suspect that you have a stop filter which cause "and" to be removed.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: abhishek jain Sent: Tuesday, March 25,
> >> 2014 1:29 PM To: solr-user@lucene.apache.org Subject: AND not as a
> >> boolean operator in Phrase
> >> hi friends,
> >>
> >> when i search for "A and B" it gives me result for A , B , i am not sure
> >> why?
> >>
> >> Please guide how can i exact match when it is within phrase/quotes.
> >>
> >> --
> >> Thanks and kind Regards,
> >> Abhishek jain
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread Erick Erickson
No, there isn't a tokenizer that'll do what you want that I know
about. Really, I suspect you need to back up a bit and re-think the
problem. It looks to me like you've taken a path that's going to cause
you endless grief when, as Jack says, phrase searches are built in to
the tokenization process.

Best,
Erick


On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky  wrote:
> Query by phrase is a core feature of tokenized text in Lucene and Solr, so
> there is no need to use a pattern token filter for that purpose. And yes,
> doing so pretty much breaks most token filters that would assume that the
> text is tokenized.
>
> -- Jack Krupansky
>
> -Original Message- From: solr-user
> Sent: Wednesday, April 2, 2014 12:46 PM
> To: solr-user@lucene.apache.org
>
> Subject: Re: how do I get search for "fort st john" to match "ft saint john"
>
> Hi Eric.
>
> No, that doesnt fix the problem either (I have tested this previously and
> did so again just now)
>
> Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
> since I want the user to search by phrase), the phrase "marina former fort
> ord" (for example) does not get turned into four tokens ("marina", "former",
> "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
> for them (by design)
>
> the original question remains: is there a tokenizer/plugin that will allow
> me to synonym words in a unbroken phrase?
>
> note: the reason I dont want to tokenize the data by whitespace is that it
> would cause way to many results to get returned if I, for example, search on
> "new" or "st" ...  However, I still want to be able to include "fort saint
> john" in the results if the user searches for "ft st john" or "fort st john"
> or ...
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: PDF Indexing

2014-04-02 Thread Ahmet Arslan
Hi Sujatha,

There is no built in mechanism. Prepare page documents outside of the solr. 
http://searchhub.org/2012/02/14/indexing-with-solrj/


And you may want to save text content somewhere too. If you change something in 
index analysis/schema you need to reindex. If you save text data, you can skip 
extraction phase at least.


Ahmet



On Wednesday, April 2, 2014 10:05 PM, Sujatha Arun  wrote:
Hi,

I  am able to use TIKA and DIH to  Index a pdf as a single document.However
I need each page to be single document. Is there any inbuilt mechanism to
achieve the same or do I have to use pdfbox or any other tool achieve this?

Regards



Re: Get number of documents in a new (not visible) Searcher

2014-04-02 Thread Ahmet Arslan
Hi Oliver,

You can see docsPending:30
adds:30 in  plugin stats section. 

http://localhost:8983/solr/#/collection1/plugins/updatehandler?entry=updateHandler

These parameters are exposed via JMX. 
https://cwiki.apache.org/confluence/display/solr/Using+JMX+with+Solr

Alternative way is to use : 
https://cwiki.apache.org/confluence/display/solr/MBean+Request+Handler

http://localhost:8983/solr/admin/mbeans?stats=true&cat=UPDATEHANDLER&wt=json


Ahmet


On Wednesday, April 2, 2014 7:42 PM, Oliver Schrenk  
wrote:
Hi,

We have a SolrCloud 4.7 cluster with five machines and index in a distributed 
fashion.

When finished adding and deleting documents, we want to commit programmaticly
and switch to a new searcher. But before doing that we want to make a final 
check that the number of documents have not changed dramatically.

How do I check the number of documents in the new but not yet open index?

Regards
Oliver


Re: AND not as a boolean operator in Phrase

2014-04-02 Thread Ahmet Arslan
Hi Abhishek,

Your best bet is dismax query parser which does not recognize and/AND as an 
operator.
q="A and B"&qf=someField&defType=dismax

Ahmet


On Wednesday, April 2, 2014 10:01 PM, abhishek jain 
 wrote:
Hi,
Ok thanks,
i want to search for phrase "A and B" with the *and *word sandwiched
between A and B. I dont want to work with and as a boolean operator when
within quotes.

I have and as a stop word and i dont want to reindex data.

What is my best bet.

thanks
abhishek jain


On Sun, Mar 30, 2014 at 2:33 AM, Bob Laferriere wrote:

> If you are using edismax you need to use AND. So A AND B will ignore the
> stop word and apply the Boolean operator. You can configure edismax to
> ignore Boolean stop words that are lowercase.
>
> Regards,
>
> Bob
>
> > On Mar 26, 2014, at 2:39 AM, abhishek jain 
> wrote:
> >
> > Hi Jack,
> > You are right, i am using 'and' as a stop word in both indexing and
> query,
> >
> > Should i use it only during  indexing?
> >
> > thanks
> >
> >
> >
> > On Tue, Mar 25, 2014 at 11:09 PM, Jack Krupansky <
> j...@basetechnology.com>wrote:
> >
> >> What does your field type analyzer look like?
> >>
> >> I suspect that you have a stop filter which cause "and" to be removed.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: abhishek jain Sent: Tuesday, March 25,
> >> 2014 1:29 PM To: solr-user@lucene.apache.org Subject: AND not as a
> >> boolean operator in Phrase
> >> hi friends,
> >>
> >> when i search for "A and B" it gives me result for A , B , i am not sure
> >> why?
> >>
> >> Please guide how can i exact match when it is within phrase/quotes.
> >>
> >> --
> >> Thanks and kind Regards,
> >> Abhishek jain
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767

>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767



Re: Analysis of Japanese characters

2014-04-02 Thread Tom Burton-West
Hi Shawn,

I may still be missing your point.  Below is an example where the
ICUTokenizer splits
Now, I'm beginning to wonder if I really understand what those flags on the
CJKBigramFilter do.
The ICUTokenizer spits out unigrams and the CJKBigramFilter will put them
back together into bigrams.

I thought if you set  han=true, hiragana=true
You would get this kind of result where the third bigram is composed of a
hirigana and han character

いろは革命歌   =>“いろ” ”ろは“  “は革”   ”革命” “命歌”

Hopefully the e-mail hasn't munged the output of the Solr analysis panel
below:

I can see this in our query processing where outpugUnigrams=false:
org.apache.solr.analysis.ICUTokenizerFactory {luceneMatchVersion=LUCENE_36}
Splits into unigrams
term text いろは革命歌
org.apache.solr.analysis.CJKBigramFilterFactory {hangul=false,
outputUnigrams=false, katakana=false, han=true, hiragana=true,
luceneMatchVersion=LUCENE_36}
makes bigrams including the middle one which is one character hirigana and
one han
term text いろろはは革革命命歌

It appears that if you include outputUnigrams=true (as we both do in the
indexing configuration) that this doesn't happen.
org.apache.solr.analysis.CJKBigramFilterFactory {hangul=false,
outputUnigrams=true, katakana=false, han=true, hiragana=true ,
luceneMatchVersion=LUCENE_36}
いろは革命歌 革命命歌 type 


Not sure what happens for katakana as the ICUTokenizer doesn't convert it
to unigrams and our configuration is set to katakana=false.   I'll play
around on the test machine when I have time.

Tom


Re: Analysis of Japanese characters

2014-04-02 Thread Shawn Heisey

On 4/2/2014 2:19 PM, Tom Burton-West wrote:

Hi Shawn,

I may still be missing your point.  Below is an example where the
ICUTokenizer splits
Now, I'm beginning to wonder if I really understand what those flags on the
CJKBigramFilter do.
The ICUTokenizer spits out unigrams and the CJKBigramFilter will put them
back together into bigrams.

I thought if you set  han=true, hiragana=true
You would get this kind of result where the third bigram is composed of a
hirigana and han character


It looks like you are right.  I did not notice that the bigram filter 
was putting the tokens back together, even though the tokenizer was 
splitting them apart.  I might be worrying over nothing!  Thank you for 
taking some time to point out the obvious.


I did notice something odd, though.  Keep in mind that I have absolutely 
no idea what I am writing here, so I have no idea if this is valid at all:


For an input of 田中角栄 the bigram filter works like you described, and 
what I would expect.  If I add a space at the point where the ICU 
tokenizer would have split them anyway, the bigram filter output is very 
different.  Best guess: It notices that the end/start values from the 
original input are not consecutive, and therefore doesn't combine them.  
Like I said above, I may have nothing at all to worry about here.


Thanks,
Shawn



Re: Get number of documents in a new (not visible) Searcher

2014-04-02 Thread Ahmet Arslan
Hi Oliver,

You can get docsPending 

http://localhost:8983/solr/admin/mbeans?cat=UPDATEHANDLER&stats=true
https://cwiki.apache.org/confluence/display/solr/MBean+Request+Handler

Ahmet


On Wednesday, April 2, 2014 7:42 PM, Oliver Schrenk  
wrote:
Hi,

We have a SolrCloud 4.7 cluster with five machines and index in a distributed 
fashion.

When finished adding and deleting documents, we want to commit programmaticly
and switch to a new searcher. But before doing that we want to make a final 
check that the number of documents have not changed dramatically.

How do I check the number of documents in the new but not yet open index?

Regards
Oliver


How to search one field and highlight another

2014-04-02 Thread Tang, Rebecca
Hi there,

For dates we create two Solr fields: date_display and date.
date_display: stored = true, indexed = false, it's for display purpose only
date: stored = false, indexed = true, it's used for searching, ordering and 
faceting

When users search on date, I need to be able to highlight date_display.  I'm 
not sure how to achieve this.  Is this possible?  Or do I have to rethink my 
index?

Thanks!

Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library
E: rebecca.t...@ucsf.edu


Re: eDismax parser and the mm parameter

2014-04-02 Thread simpleliving...@gmail.com
Ahmet.

Thanks I will look into this option . Does spellchecker support multiple word 
search terms? 

Sent from my HTC

- Reply message -
From: "Ahmet Arslan" 
To: "solr-user@lucene.apache.org" 
Subject: eDismax parser and the mm parameter
Date: Wed, Apr 2, 2014 10:53 AM

Hi SL,

Instead of fuzzy queries, can't you use spell checker? Generally Spell Checker 
(a.k.a did you mean) is a preferred tool for typos.

Ahmet

On Wednesday, April 2, 2014 4:13 PM, "simpleliving...@gmail.com" 
 wrote:

It only works for a single word search term and not multiple word search term.

Sent from my HTC

- Reply message -
From: "William Bell" 
To: "solr-user@lucene.apache.org" 
Subject: eDismax parser and the mm parameter
Date: Wed, Apr 2, 2014 12:03 AM

Fuzzy is provided use ~


On Mon, Mar 31, 2014 at 11:04 PM, S.L  wrote:

> Jack ,
>
> Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
> the mm parameter from my queries, however for the fuzzy phrase queries , I
> am not sure how I would be able to leverage the Complex Query Parser there
> is absolutely nothing out there that gives me any idea as to how to do that
> .
>
> Why is fuzzy phrase search not provided by Solr OOB ? I am surprised
>
> Thanks.
>
>
> On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky  >wrote:
>
> > The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> > (the default) and ignore the mm parameter. Give pf the highest boost, and
> > boost pf3 higher than pf2.
> >
> > You could try using the complex phrase query parser for the third case.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: S.L
> > Sent: Monday, March 31, 2014 12:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: eDismax parser and the mm parameter
> >
> > Thanks Jack , my use cases are as follows.
> >
> >
> >   1. Search for "Ginseng" everything related to ginseng should show up.
> >   2. Search For "White Siberian Ginseng" results with the whole phrase
> >   show up first followed by 2 words from the phrase followed by a single
> > word
> >   in the phrase
> >   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
> >   documents with White Siberian Ginseng Should show up , this looks like
> > the
> >   most complicated of all as Solr does not support fuzzy phrase searches
> .
> > (I
> >   have no solution for this yet).
> >
> > Thanks again!
> >
> >
> > On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky <
> j...@basetechnology.com>
> > wrote:
> >
> >  The mm parameter is really only relevant when the default operator is OR
> >> or explicit OR operators are used.
> >>
> >> Again: Please provide your use case examples and your expectations for
> >> each use case. It really doesn't make a lot of sense to prematurely
> focus
> >> on a solution when you haven't clearly defined your use cases.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: S.L
> >> Sent: Sunday, March 30, 2014 9:13 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: eDismax parser and the mm parameter
> >>
> >> Jack,
> >>
> >> I mis-stated the problem , I am not using the OR operator as default
> >> now(now that I think about it it does not make sense to use the default
> >> operator OR along with the mm parameter) , the reason I want to use pf
> and
> >> mm in conjunction is because of my understanding of the edismax parser
> and
> >> I have not looked into pf2 and pf3 parameters yet.
> >>
> >> I will state my understanding here below.
> >>
> >> Pf -  Is used to boost the result score if the complete phrase matches.
> >> mm <(less than) search term length would help limit the query results
>  to
> >> a
> >> certain number of better matches.
> >>
> >> With that being said would it make sense to have dynamic mm (set to the
> >> length of search term - 1)?
> >>
> >> I also have a question around using a fuzzy search along with eDismax
> >> parser , but I will ask that in a seperate post once I go thru that
> aspect
> >> of eDismax parser.
> >>
> >> Thanks again !
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky <
> j...@basetechnology.com>
> >> wrote:
> >>
> >>  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
> >>
> >>> will be dwarfed.
> >>>
> >>> The general goal is to assure that the top documents really are the
> best,
> >>> not to necessarily limit the total document count. Focusing on the
> latter
> >>> could be a real waste of time.
> >>>
> >>> It's still not clear why or how you need or want to use OR as the
> default
> >>> operator - you still haven't given us a use case for that.
> >>>
> >>> To repeat: Give us a full set of use cases before taking this XY
> Problem
> >>> approach of pursuing a solution before the problem is understood.
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -Original Message- From: S.L
> >>> Sent: Sunday, March 30, 2014 6:14 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: eDismax parser and the mm parameter

Re: Errors after upgrading to 4.6.1

2014-04-02 Thread Chris Hostetter

: What should I be doing to fix them?  Is there a replacement for those
: classes?  Do I just need to change the luceneMatchVersion to be LUCENE_461
: or something?

that's pretty much exactly what that warning message is trying to tell you 
-- your config ways to use LUCENE_33 mode, but that won't work once 5.0 
comes out. so at some point you need to increase that to at least 
LUCENE_40 and re-index to make the warning go away (and to be able to 
upgrade to Solr 5.0)

If you have suggestions for how to improve the warning message please let 
us know.




-Hoss
http://www.lucidworks.com/


Re: Block Join Parent Query across children docs

2014-04-02 Thread Chris Hostetter

: Thanks for your response. Here is an example of what I'm trying to do. If I
: had the following documents:

what you are attempting is fairly trivial -- you want to query for all 
parent documents, then kapply 3 filters:

 * parent of a child matching item1
 * parent of a child matching item2
 * not a parent of a chile matching item3

Part of your problem may be that (in your example you posted anywayway) 
you appear to be trying to use a *string* field for listing multiple terms 
with commas and then seem to want to match on those individual terms -- 
that's not going to work.  either make your string field a true 
multivalued field, or use a text field with tokenization.

With the modified example data you provided below (using search_t instead 
of search_s) this query seems to do exactly waht you want...

http://localhost:8983/solr/select?p_filt=type_s:parent&q=*:*&fq={!parent%20which=$p_filt}search_t:item2&fq={!parent%20which=$p_filt}search_t:item1&fq=-{!parent%20which=$p_filt}search_t:item3

 q = *:*
p_filt = type_s:parent
wt = json
fq =  {!parent which=$p_filt}search_t:item2
fq =  {!parent which=$p_filt}search_t:item1
fq = -{!parent which=$p_filt}search_t:item3


-Hoss
http://www.lucidworks.com/


Re: eDismax parser and the mm parameter

2014-04-02 Thread Ahmet Arslan
Yes, it has spellcheck.collate parameter. I mean it has lots of parameters and 
with correct combination of parameters 
it can suggest "White Siberian Ginseng" from "Whte Sberia Ginsng"

https://cwiki.apache.org/confluence/display/solr/Spell+Checking 




On Thursday, April 3, 2014 1:57 AM, "simpleliving...@gmail.com" 
 wrote:
Ahmet.

Thanks I will look into this option . Does spellchecker support multiple word 
search terms? 

Sent from my HTC

- Reply message -
From: "Ahmet Arslan" 
To: "solr-user@lucene.apache.org" 
Subject: eDismax parser and the mm parameter
Date: Wed, Apr 2, 2014 10:53 AM

Hi SL,

Instead of fuzzy queries, can't you use spell checker? Generally Spell Checker 
(a.k.a did you mean) is a preferred tool for typos.

Ahmet

On Wednesday, April 2, 2014 4:13 PM, "simpleliving...@gmail.com" 
 wrote:

It only works for a single word search term and not multiple word search term.

Sent from my HTC

- Reply message -
From: "William Bell" 
To: "solr-user@lucene.apache.org" 
Subject: eDismax parser and the mm parameter
Date: Wed, Apr 2, 2014 12:03 AM

Fuzzy is provided use ~


On Mon, Mar 31, 2014 at 11:04 PM, S.L  wrote:

> Jack ,
>
> Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
> the mm parameter from my queries, however for the fuzzy phrase queries , I
> am not sure how I would be able to leverage the Complex Query Parser there
> is absolutely nothing out there that gives me any idea as to how to do that
> .
>
> Why is fuzzy phrase search not provided by Solr OOB ? I am surprised
>
> Thanks.
>
>
> On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky  >wrote:
>
> > The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> > (the default) and ignore the mm parameter. Give pf the highest boost, and
> > boost pf3 higher than pf2.
> >
> > You could try using the complex phrase query parser for the third case.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: S.L
> > Sent: Monday, March 31, 2014 12:08 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: eDismax parser and the mm parameter
> >
> > Thanks Jack , my use cases are as follows.
> >
> >
> >   1. Search for "Ginseng" everything related to ginseng should show up.
> >   2. Search For "White Siberian Ginseng" results with the whole phrase
> >   show up first followed by 2 words from the phrase followed by a single
> > word
> >   in the phrase
> >   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
> >   documents with White Siberian Ginseng Should show up , this looks like
> > the
> >   most complicated of all as Solr does not support fuzzy phrase searches
> .
> > (I
> >   have no solution for this yet).
> >
> > Thanks again!
> >
> >
> > On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky <
> j...@basetechnology.com>
> > wrote:
> >
> >  The mm parameter is really only relevant when the default operator is OR
> >> or explicit OR operators are used.
> >>
> >> Again: Please provide your use case examples and your expectations for
> >> each use case. It really doesn't make a lot of sense to prematurely
> focus
> >> on a solution when you haven't clearly defined your use cases.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: S.L
> >> Sent: Sunday, March 30, 2014 9:13 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: eDismax parser and the mm parameter
> >>
> >> Jack,
> >>
> >> I mis-stated the problem , I am not using the OR operator as default
> >> now(now that I think about it it does not make sense to use the default
> >> operator OR along with the mm parameter) , the reason I want to use pf
> and
> >> mm in conjunction is because of my understanding of the edismax parser
> and
> >> I have not looked into pf2 and pf3 parameters yet.
> >>
> >> I will state my understanding here below.
> >>
> >> Pf -  Is used to boost the result score if the complete phrase matches.
> >> mm <(less than) search term length would help limit the query results
>  to
> >> a
> >> certain number of better matches.
> >>
> >> With that being said would it make sense to have dynamic mm (set to the
> >> length of search term - 1)?
> >>
> >> I also have a question around using a fuzzy search along with eDismax
> >> parser , but I will ask that in a seperate post once I go thru that
> aspect
> >> of eDismax parser.
> >>
> >> Thanks again !
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky <
> j...@basetechnology.com>
> >> wrote:
> >>
> >>  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
> >>
> >>> will be dwarfed.
> >>>
> >>> The general goal is to assure that the top documents really are the
> best,
> >>> not to necessarily limit the total document count. Focusing on the
> latter
> >>> could be a real waste of time.
> >>>
> >>> It's still not clear why or how you need or want to use OR as the
> default
> >>> operator - you still haven't given us a use case for that.
> >>>
> >>> To repeat: Give us a full set of

Re: eDismax parser and the mm parameter

2014-04-02 Thread S.L
Thanks Ahmet, I would definitely look into this . I appreciate that.


On Wed, Apr 2, 2014 at 7:47 PM, Ahmet Arslan  wrote:

> Yes, it has spellcheck.collate parameter. I mean it has lots of parameters
> and with correct combination of parameters
> it can suggest "White Siberian Ginseng" from "Whte Sberia Ginsng"
>
> https://cwiki.apache.org/confluence/display/solr/Spell+Checking
>
>
>
>
> On Thursday, April 3, 2014 1:57 AM, "simpleliving...@gmail.com" <
> simpleliving...@gmail.com> wrote:
> Ahmet.
>
> Thanks I will look into this option . Does spellchecker support multiple
> word search terms?
>
> Sent from my HTC
>
> - Reply message -
> From: "Ahmet Arslan" 
> To: "solr-user@lucene.apache.org" 
> Subject: eDismax parser and the mm parameter
> Date: Wed, Apr 2, 2014 10:53 AM
>
> Hi SL,
>
> Instead of fuzzy queries, can't you use spell checker? Generally Spell
> Checker (a.k.a did you mean) is a preferred tool for typos.
>
> Ahmet
>
> On Wednesday, April 2, 2014 4:13 PM, "simpleliving...@gmail.com" <
> simpleliving...@gmail.com> wrote:
>
> It only works for a single word search term and not multiple word search
> term.
>
> Sent from my HTC
>
> - Reply message -
> From: "William Bell" 
> To: "solr-user@lucene.apache.org" 
> Subject: eDismax parser and the mm parameter
> Date: Wed, Apr 2, 2014 12:03 AM
>
> Fuzzy is provided use ~
>
>
> On Mon, Mar 31, 2014 at 11:04 PM, S.L  wrote:
>
> > Jack ,
> >
> > Thanks a lot , I am now using the pf ,pf2 an pf3  and have gotten rid of
> > the mm parameter from my queries, however for the fuzzy phrase queries ,
> I
> > am not sure how I would be able to leverage the Complex Query Parser
> there
> > is absolutely nothing out there that gives me any idea as to how to do
> that
> > .
> >
> > Why is fuzzy phrase search not provided by Solr OOB ? I am surprised
> >
> > Thanks.
> >
> >
> > On Mon, Mar 31, 2014 at 5:39 AM, Jack Krupansky  > >wrote:
> >
> > > The pf, pf2, and pf3 parameters should cover cases 1 and 2. Use q.op=OR
> > > (the default) and ignore the mm parameter. Give pf the highest boost,
> and
> > > boost pf3 higher than pf2.
> > >
> > > You could try using the complex phrase query parser for the third case.
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: S.L
> > > Sent: Monday, March 31, 2014 12:08 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: eDismax parser and the mm parameter
> > >
> > > Thanks Jack , my use cases are as follows.
> > >
> > >
> > >   1. Search for "Ginseng" everything related to ginseng should show up.
> > >   2. Search For "White Siberian Ginseng" results with the whole phrase
> > >   show up first followed by 2 words from the phrase followed by a
> single
> > > word
> > >   in the phrase
> > >   3. Fuzzy Search "Whte Sberia Ginsng" (please note the typos here)
> > >   documents with White Siberian Ginseng Should show up , this looks
> like
> > > the
> > >   most complicated of all as Solr does not support fuzzy phrase
> searches
> > .
> > > (I
> > >   have no solution for this yet).
> > >
> > > Thanks again!
> > >
> > >
> > > On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky <
> > j...@basetechnology.com>
> > > wrote:
> > >
> > >  The mm parameter is really only relevant when the default operator is
> OR
> > >> or explicit OR operators are used.
> > >>
> > >> Again: Please provide your use case examples and your expectations for
> > >> each use case. It really doesn't make a lot of sense to prematurely
> > focus
> > >> on a solution when you haven't clearly defined your use cases.
> > >>
> > >> -- Jack Krupansky
> > >>
> > >> -Original Message- From: S.L
> > >> Sent: Sunday, March 30, 2014 9:13 PM
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: eDismax parser and the mm parameter
> > >>
> > >> Jack,
> > >>
> > >> I mis-stated the problem , I am not using the OR operator as default
> > >> now(now that I think about it it does not make sense to use the
> default
> > >> operator OR along with the mm parameter) , the reason I want to use pf
> > and
> > >> mm in conjunction is because of my understanding of the edismax parser
> > and
> > >> I have not looked into pf2 and pf3 parameters yet.
> > >>
> > >> I will state my understanding here below.
> > >>
> > >> Pf -  Is used to boost the result score if the complete phrase
> matches.
> > >> mm <(less than) search term length would help limit the query results
> >  to
> > >> a
> > >> certain number of better matches.
> > >>
> > >> With that being said would it make sense to have dynamic mm (set to
> the
> > >> length of search term - 1)?
> > >>
> > >> I also have a question around using a fuzzy search along with eDismax
> > >> parser , but I will ask that in a seperate post once I go thru that
> > aspect
> > >> of eDismax parser.
> > >>
> > >> Thanks again !
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky <
> > j...@basetechnology.com>
> > >> wrote:
> > >>
> > >>  If you use pf, pf2,

Re: Errors on index in SolrCloud: ConcurrentUpdateSolrServer$Runner.run()

2014-04-02 Thread rulinma
org.apache.solr.common.SolrException: Bad Request



request:
http://192.168.22.35:8080/solr/collection_networkSchool_shard1_replica3/update?update.distrib=FROMLEADER&distrib.from=http%3A%2F%2F192.168.22.34%3A8080%2Fsolr%2Fcollection_networkSchool_shard1_replica1%2F&wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

simility to this.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Errors-on-index-in-SolrCloud-ConcurrentUpdateSolrServer-Runner-run-tp4107661p4128748.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Errors on index in SolrCloud: ConcurrentUpdateSolrServer$Runner.run()

2014-04-02 Thread rulinma
I see it is config problem. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Errors-on-index-in-SolrCloud-ConcurrentUpdateSolrServer-Runner-run-tp4107661p4128751.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PDF Indexing

2014-04-02 Thread Jack Krupansky
I see that the PDFBox library (which is what Tika uses for PDF files) has 
methods to manipulate individual pages:

http://stackoverflow.com/questions/6839787/reading-a-particular-page-from-a-pdf-document-using-pdfbox

-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Wednesday, April 2, 2014 3:35 PM
To: solr-user@lucene.apache.org
Subject: Re: PDF Indexing

Hi Sujatha,

There is no built in mechanism. Prepare page documents outside of the solr.
http://searchhub.org/2012/02/14/indexing-with-solrj/


And you may want to save text content somewhere too. If you change something 
in index analysis/schema you need to reindex. If you save text data, you can 
skip extraction phase at least.



Ahmet



On Wednesday, April 2, 2014 10:05 PM, Sujatha Arun  
wrote:

Hi,

I  am able to use TIKA and DIH to  Index a pdf as a single document.However
I need each page to be single document. Is there any inbuilt mechanism to
achieve the same or do I have to use pdfbox or any other tool achieve this?

Regards 



Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread Jack Krupansky
And, if you use the pf, pf2, and pf3 parameters of edismax, with boosting, 
you can assure that the closest matches always appear first.


And assuming you do index-time synonym expansion.

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, April 2, 2014 3:09 PM
To: solr-user@lucene.apache.org
Subject: Re: how do I get search for "fort st john" to match "ft saint john"

No, there isn't a tokenizer that'll do what you want that I know
about. Really, I suspect you need to back up a bit and re-think the
problem. It looks to me like you've taken a path that's going to cause
you endless grief when, as Jack says, phrase searches are built in to
the tokenization process.

Best,
Erick


On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky  
wrote:

Query by phrase is a core feature of tokenized text in Lucene and Solr, so
there is no need to use a pattern token filter for that purpose. And yes,
doing so pretty much breaks most token filters that would assume that the
text is tokenized.

-- Jack Krupansky

-Original Message- From: solr-user
Sent: Wednesday, April 2, 2014 12:46 PM
To: solr-user@lucene.apache.org

Subject: Re: how do I get search for "fort st john" to match "ft saint 
john"


Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by 
design

since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", 
"former",
"fort" and "ord"), and so the SynonymFilterFactory does not create 
synonyms

for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search 
on

"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st 
john"

or ...



--
View this message in context:
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com. 




Re: Flush buffer exceptions

2014-04-02 Thread Toke Eskildsen
On Wed, 2014-04-02 at 15:45 +0200, ku3ia wrote:
> ERROR - 2014-03-27 17:11:15.022; org.apache.solr.common.SolrException;
> null:org.eclipse.jetty.io.EofException
> at
> org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
[...]
> java.lang.IllegalStateException: Committed
> at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1144)
> ...
> 
> Does anyone have some ideas about this?

Looks like your client timed out or lost the connection somehow. The
issued Solr job is unaffected by this until it tries to deliver the
result. When it does, it fails with an error like the one above.

If this is for a search, it is an indicator that your server responds
too slow. If this is an index update, you should increase timeouts in
your client.

- Toke Eskildsen, State and University Library, Denmark



Re: Flush buffer exceptions

2014-04-02 Thread Alexandre Rafalovitch
Is your Solr talking directly to web browser or to a client app. If to
a browser, they may have just closed the window. If to the client, you
need to check the timeouts, crashes, etc.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Thu, Apr 3, 2014 at 1:40 PM, Toke Eskildsen  wrote:
> On Wed, 2014-04-02 at 15:45 +0200, ku3ia wrote:
>> ERROR - 2014-03-27 17:11:15.022; org.apache.solr.common.SolrException;
>> null:org.eclipse.jetty.io.EofException
>> at
>> org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
> [...]
>> java.lang.IllegalStateException: Committed
>> at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1144)
>> ...
>>
>> Does anyone have some ideas about this?
>
> Looks like your client timed out or lost the connection somehow. The
> issued Solr job is unaffected by this until it tries to deliver the
> result. When it does, it fails with an error like the one above.
>
> If this is for a search, it is an indicator that your server responds
> too slow. If this is an index update, you should increase timeouts in
> your client.
>
> - Toke Eskildsen, State and University Library, Denmark
>


Re: sort by an attribute values sequence

2014-04-02 Thread santosh sidnal
Hi Ahmet/All,

Thanks for the reply.

The Solution of boosting those product type values will work fine if i
don't apply any 'sort' .

But my requirement is i want sorting to be applied and boost a
particular/some attribute values (C,B etc) in the sorted result, which is
not working. Looks like sorting will take precedence over boosting. Correct
me if i am wrong.

Also i am trying  functional query, looks like even i will face same
problem over there.

we have more only 4 values for producttype attribute, but for different
keywords we have to use different attributes in sorting and boosting the
result, that we can manage in our application.


Regards,
Santosh


On Wed, Apr 2, 2014 at 8:49 PM, Ahmet Arslan  wrote:

> Hi,
>
> How many distinct producttype do you have?
>
> May be
>
> q=C^5000 OR B^4000 OR A^3000 OR D&df=producttype
>
> could work.
>
> If you can came up with a function that takes maximum value when
> producttype=C ... etc you can sort by function queries too.
> http://wiki.apache.org/solr/FunctionQuery
>
>
> Ahmet
>
>
> On Wednesday, April 2, 2014 1:52 PM, santosh sidnal <
> sidnal.sant...@gmail.com> wrote:
> Re-sending my e-mail. any pointers/ links for the issue will help me lot.
>
> Thanks in advance.
>
>
> On Tue, Apr 1, 2014 at 4:25 PM, santosh sidnal  >wrote:
>
> > Hi All,
> >
> > We have a specific requirement of sorting the products as per a specific
> > attribute value sequence. Any pointer or source of info would help us.
> >
> > Example of the scenario;
> >
> > Let's say for search result i want to sort results based on a attribute
> > producttype. Where producttype has following values, A, B, C, D.
> >
> > so while in solr query i can give either producttype asc, producttype
> desc.
> >
> > But I want get result in a specific way by saying first give me All
> > results of values 'C' then B, A, D.
> >
> >
> > --
> > Regards,
> > Santosh Sidnal
>
> >
> >
>
>
> --
> Regards,
> Santosh Sidnal
>
>


-- 
Regards,
Santosh Sidnal