Re: Is SOLR best suited to this application - Finding co-ordinates

2012-08-01 Thread Spadez
Normalising the data is a good idea, and it would be easy to do since I would
only have around 50,000 entires BUT it is a bit complicated with addresses I
think. Lets say I store the data in this form:



London, England
Swindon, Wiltshire, England
Wiltshire England
England

What happens if someone searches just "London", or just "Swindon". I assume
it wouldnt return any results because they would have to type London,
England for example. If I include an entry for London and "London, England"
then the autocomplete will show both, which would confuse the user.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-SOLR-best-suited-to-this-application-Finding-co-ordinates-tp3998308p3998547.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr upgrade from 1.4 to 3.6

2012-08-01 Thread Chantal Ackermann
Hi Kalyan,

that is becouse SolrJ uses "javabin" as format which has class version numbers 
in the serialized objects that do not match. Set the format to XML ("wt" 
parameter) and it will work (maybe JSON would, as well).

Chantal
 

Am 31.07.2012 um 20:50 schrieb Manepalli, Kalyan:

> Hi all,
>We are trying to upgrade our solr instance from 1.4 to 3.6. We 
> use SolrJ API to fetch the data from index. We see that SolrJ 3.6 version is 
> not compatible with index generated with 1.4.
> Is this known issue and is there a workaround for this.
> 
> Thanks,
> Kalyan Manepalli
> 



auto completion search with solr using NGrams in SOLR

2012-08-01 Thread aniljayanti
I want to implement an auto completion search with solr using NGrams. If the
user is searching for names of employees, then auto completion should be
applied. ie., 

if types "j" then need to show the names starts with "j" if types "ja" then
need to show the names starts with "ja" if types "jac" then need to show the
names starts with "jak" if types "jack" then need to show the names starts
with "jack"

Below is my configuration settings in schema.xml, Please suggest me if
anything wrong.

below is my code in schema.xml


 
   
   
   
  
 
   
   
  
  

 
 

when im searching with name "mado" or "madonna" getting employees names.But
when searching with "madon" not getting any data.

Please help me on this.


Thanks in Advance,

Anil.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559.html
Sent from the Solr - User mailing list archive at Nabble.com.


Urgent: Facetable but not Searchable Field

2012-08-01 Thread jayakeerthi s
All,

We have a requirement, where we need to implement 2 fields as Facetable,
but the values of the fields should not be Searchable.

Please let me know is this feature Supported in Solr If yes what would be
the Configuration to be done in Schema.xml and Solrconfig.xml to achieve
the same.

This is kind of urgent as we need to reply on the functionality.


Thanks in advance,

Jay


Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Michael Kuhlmann

On 01.08.2012 13:58, jayakeerthi s wrote:

We have a requirement, where we need to implement 2 fields as Facetable,
but the values of the fields should not be Searchable.


Simply don't search for it, then it's not searchable.

Or do I simply don't understand your question? As long as Dismax doesn't 
have the attribute in its qf parameter, it's not getting searched.


Or, if the user has direct access to Solr, then she can search for the 
attribute. And can delete the index, or crash the server, if she likes.


So the short anser is: No. Facettable fields must be searchable. But 
usually, this is no problem.


-Kuli


Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Yonik Seeley
On Wed, Aug 1, 2012 at 7:58 AM, jayakeerthi s  wrote:
> We have a requirement, where we need to implement 2 fields as Facetable,
> but the values of the fields should not be Searchable.

The user fields "uf" feature of the edismax parser may work for you:

http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29

-Yonik
http://lucidimagination.com


AW: auto completion search with solr using NGrams in SOLR

2012-08-01 Thread Markus Klose
Your configuration of the fieldtype looks quite ok.

In what field are you searching? "text"? " empname " ? "autocomplete_text"?
If you are searching in "autocomplete_text " how do you add content to it? Is 
there another copyfield statement? If you are searching in "text" what 
fieldtype has that field.

You can use the analysis.jsp (linked at the admin console) to check what 
happens with your content at index time and search time and if there is a match.

Viele Grüße aus Augsburg

Markus Klose
SHI Elektronische Medien GmbH 
 

-Ursprüngliche Nachricht-
Von: aniljayanti [mailto:anil.jaya...@gmail.com] 
Gesendet: Mittwoch, 1. August 2012 12:05
An: solr-user@lucene.apache.org
Betreff: auto completion search with solr using NGrams in SOLR

I want to implement an auto completion search with solr using NGrams. If the 
user is searching for names of employees, then auto completion should be 
applied. ie., 

if types "j" then need to show the names starts with "j" if types "ja" then 
need to show the names starts with "ja" if types "jac" then need to show the 
names starts with "jak" if types "jack" then need to show the names starts with 
"jack"

Below is my configuration settings in schema.xml, Please suggest me if anything 
wrong.

below is my code in schema.xml


 
  
  
  
  
 
  
  
  
  
   

when im searching with name "mado" or "madonna" getting employees names.But 
when searching with "madon" not getting any data.

Please help me on this.


Thanks in Advance,

Anil.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559.html
Sent from the Solr - User mailing list archive at Nabble.com.


termFrequncy off and still use fastvector highlighter?

2012-08-01 Thread abhayd
hi
We would like to turn off TF for a field but we still want to use fast
vector highlighter.

How would we do that?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/termFrequncy-off-and-still-use-fastvector-highlighter-tp3998590.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Jack Krupansky
The "indexed" and "stored" field attributes are independent, so you can 
define a facet field as "stored" but not "indexed" (stored="true" 
indexed="false"), so that the field can be faceted but not indexed.


In addition, you can also use a copyField to copy the original values for an 
indexed field (before the values get analyzed and transformed to be placed 
in the index as terms) to a stored field to facet them (or vice versa).


-- Jack Krupansky

-Original Message- 
From: jayakeerthi s

Sent: Wednesday, August 01, 2012 6:58 AM
To: solr-user@lucene.apache.org ; solr-user-h...@lucene.apache.org ; 
solr-dev-h...@lucene.apache.org

Subject: Urgent: Facetable but not Searchable Field

All,

We have a requirement, where we need to implement 2 fields as Facetable,
but the values of the fields should not be Searchable.

Please let me know is this feature Supported in Solr If yes what would be
the Configuration to be done in Schema.xml and Solrconfig.xml to achieve
the same.

This is kind of urgent as we need to reply on the functionality.


Thanks in advance,

Jay 



Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Michael Kuhlmann

On 01.08.2012 15:40, Jack Krupansky wrote:

The "indexed" and "stored" field attributes are independent, so you can
define a facet field as "stored" but not "indexed" (stored="true"
indexed="false"), so that the field can be faceted but not indexed.


?

A field must be indexed to be used for faceting.

-Kuli


Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Jack Krupansky
Oops. Obviously facet fields must be indexed. Not sure what I was thinking 
at the moment.


-- Jack Krupansky

-Original Message- 
From: Michael Kuhlmann

Sent: Wednesday, August 01, 2012 8:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Urgent: Facetable but not Searchable Field

On 01.08.2012 15:40, Jack Krupansky wrote:

The "indexed" and "stored" field attributes are independent, so you can
define a facet field as "stored" but not "indexed" (stored="true"
indexed="false"), so that the field can be faceted but not indexed.


?

A field must be indexed to be used for faceting.

-Kuli 



Cloud and cores

2012-08-01 Thread Pierre GOSSÉ
Hi all,

I'm playing around with SolrCloud and followed indications I found at 
http://wiki.apache.org/solr/SolrCloud/

-  Started Instance 1 with embedded zk

-  Started Instances 2 3 and 4 using Instance 1 as zk server.

Everything works fine.

Then, using CoreAdmin, I add a second core in collection1 for Instance 1 and 3 
... everything is ok in the admin GUI, meaning that the graph show 2 shards of 
3 server addresses each, those having 2 cores showing to time on the graph.

collection1  shard1  wks-pge:7574
wks-pge:8900
wks-pge:8983
shard2  wks-pge:8983
wks-pge:7500
wks-pge:8900

On instances 1 and 3 I have 2 cores both at the bottom of the left column, and 
in the CoreAdmin screen.

I restart everything, and find the server in what seems to be an inconsistent 
state : i.e. graph still showing 2 shards of 3 server addresses, but CoreAdmin 
not showing my additional cores any more.

Is there a problem in SolrCloud or CoreAdmin, or did I just do something stupid 
here ? :)

Pierre



Map Complex Datastructure with Solr

2012-08-01 Thread Thomas Gravel
Hi,
how can I map these complex Datastructure in Solr?

Document
- Groups
 - Group_ID
 - Group_Name
 - .
   - Title
   - Chapter
 - Chapter_Title
 - Chapter_Content


Or

Product
- Groups
 - Group_ID
 - Group_Name
 - .
   - Title
   - Articles
 - Artilce_ID
 - Artilce_Color
 - Artilce_Size

Thanks for ideas


Re: Map Complex Datastructure with Solr

2012-08-01 Thread Jack Krupansky
The general rule is to flatten the structures. You have a choice between 
sharing common fields between tables, such as "title", or adding a 
prefix/suffix to qualify them, such as "document_title" vs. "product_title".


You also have the choice of storing different tables in separate Solr 
cores/collections, but then you have the burden of querying them separately 
and coordinating the separate results on your own. It all depends on your 
application.


A lot hinges on:

1. How do you want to search the data?
2. How do you want to access the fields once the Solr documents have been 
identified by a query - such as fields to retrieve, "join", etc.


So, once the data is indexed, what are your requirements for accessing the 
data? E.g., some sample pseudo-queries and the fields you want to access.


-- Jack Krupansky

-Original Message- 
From: Thomas Gravel

Sent: Wednesday, August 01, 2012 9:52 AM
To: solr-user@lucene.apache.org
Subject: Map Complex Datastructure with Solr

Hi,
how can I map these complex Datastructure in Solr?

Document
   - Groups
- Group_ID
- Group_Name
- .
  - Title
  - Chapter
- Chapter_Title
- Chapter_Content


Or

Product
   - Groups
- Group_ID
- Group_Name
- .
  - Title
  - Articles
- Artilce_ID
- Artilce_Color
- Artilce_Size

Thanks for ideas 



Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread Robert Muir
On Tue, Jul 31, 2012 at 2:34 PM, roz dev  wrote:
> Hi All
>
> I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
> when we are indexing lots of data with 16 concurrent threads, Heap grows
> continuously. It remains high and ultimately most of the stuff ends up
> being moved to Old Gen. Eventually, Old Gen also fills up and we start
> getting into excessive GC problem.

Hi: I don't claim to know anything about how tomcat manages threads,
but really you shouldnt have all these objects.

In general snowball stemmers should be reused per-thread-per-field.
But if you have a lot of fields*threads, especially if there really is
high thread churn on tomcat, then this could be bad with snowball:
see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

I think it would be useful to see if you can tune tomcat's threadpool
as he describes.

separately: Snowball stemmers are currently really ram-expensive for
stupid reasons.
each one creates a ton of Among objects, e.g. an EnglishStemmer today
is about 8KB.

I'll regenerate these and open a JIRA issue: as the snowball code
generator in their svn was improved
recently and each one now takes about 64 bytes instead (the Among's
are static and reused).

Still this wont really "solve your problem", because the analysis
chain could have other heavy parts
in initialization, but it seems good to fix.

As a workaround until then you can also just use the "good old
PorterStemmer" (PorterStemFilterFactory in solr).
Its not exactly the same as using Snowball(English) but its pretty
close and also much faster.

-- 
lucidimagination.com


RE: Cloud and cores

2012-08-01 Thread Pierre GOSSÉ
It may have something to do with SOLR-3425, but I'm not that sure it fits.

I made some more tests.

Case 1 : with SolrCloud
I can create a new core on one of the server by the admin GUI or by CREATE 
directive in URL. The data folder is created (but no conf folder, I believe zk 
conf is used). However ./solr/solr.xml is not updated with the new core 
parameter.
If I restart the server, the core is lost (but data folder is kept)

Case 2 : on a single solr server
Creation of new core fails by the gui with error :
GRAVE: org.apache.solr.common.SolrException: Error executing default 
implementation of CREATE
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:396)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:141)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:175)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)

at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in 
classpath or 'solr\core2\conf/', cwd=F:\solr-4.0\Test
at 
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:294)
at 
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:260)
at org.apache.solr.core.Config.(Config.java:111)
at org.apache.solr.core.Config.(Config.java:78)
at org.apache.solr.core.SolrConfig.(SolrConfig.java:117)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:391)
... 29 more

Using an URL CREATE and giving relative pathes for solrconfig.xml and shema.xml 
fails later on stopwords.txt

Again solr/solr.xml is not updated, but the runtime exception could explain 
that in this case.

Pierre
-Message d'origine-
De : Pierre GOSSÉ [mailto:pierre.go...@arisem.com] 
Envoyé : mercredi 1 août 2012 16:22
À : solr-user@lucene.apache.org
Objet : Cloud and cores

Hi all,

I'm playing around with SolrCloud and followed indications I found at 
http://wiki.apache.org/solr/SolrCloud/

-  Started Instance 1 with embedded zk

-  Started Instances 2 3 and 4 using Instance 1 as zk server.

Everything works fine.

Then, using CoreAdmin, I add a second core in collection1 for Instance 1 and 3 
... everything is ok in the admin GUI, meaning that the graph show 2 shards of 
3 server addresses each, those having 2 cores s

StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread raonalluri
I have a field type like the following:


  



  



This type is behaving differently in Solr 3.3 and 3.6. In 3.3, the following
doesn't return any records because there is no author called 'Gerri Killis'.
But there is an author called ''Gerri Jonathan'.

/select/?q=author:Gerri\ Killis

In 3.6, the following returns records because there is an author called
'Gerri Jonathan'. So something is wrong in 3.6?. I didn't expect any records
here, because there is no author called 'Gerri Killis'.

/select/?q=author:Gerri\ Killis


Your help is appreciated.

Thanks
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Map Complex Datastructure with Solr

2012-08-01 Thread Thomas Gravel
Thanks for the answer.

Ich have to explain, where the problem is...

you may have at the shop solutions products and articles.
The product is the parent of all articles...

in json...

{
   "product_name": "tank top",
   "article_list": [
 {
  "color": "red",
  "price": 10.99,
  "size": "XL",
  "inStore": true
 },
 {
  "color": "blue",
  "price": 15.99,
  "size": "XL",
  "inStore": false
 }
   ]
}

the problem is not the search (i think, because you can use
copyField), but the searchresults...

I have read the possibility to create own FieldTypes, but I don't know
if this is the answer of my issues...

2012/8/1 Jack Krupansky :
> The general rule is to flatten the structures. You have a choice between
> sharing common fields between tables, such as "title", or adding a
> prefix/suffix to qualify them, such as "document_title" vs. "product_title".
>
> You also have the choice of storing different tables in separate Solr
> cores/collections, but then you have the burden of querying them separately
> and coordinating the separate results on your own. It all depends on your
> application.
>
> A lot hinges on:
>
> 1. How do you want to search the data?
> 2. How do you want to access the fields once the Solr documents have been
> identified by a query - such as fields to retrieve, "join", etc.
>
> So, once the data is indexed, what are your requirements for accessing the
> data? E.g., some sample pseudo-queries and the fields you want to access.
>
> -- Jack Krupansky
>
> -Original Message- From: Thomas Gravel
> Sent: Wednesday, August 01, 2012 9:52 AM
> To: solr-user@lucene.apache.org
> Subject: Map Complex Datastructure with Solr
>
>
> Hi,
> how can I map these complex Datastructure in Solr?
>
> Document
>- Groups
> - Group_ID
> - Group_Name
> - .
>   - Title
>   - Chapter
> - Chapter_Title
> - Chapter_Content
>
>
> Or
>
> Product
>- Groups
> - Group_ID
> - Group_Name
> - .
>   - Title
>   - Articles
> - Artilce_ID
> - Artilce_Color
> - Artilce_Size
>
> Thanks for ideas


Re: Map Complex Datastructure with Solr

2012-08-01 Thread Alexandre Rafalovitch
Sorry, that did not explain the problem, just more info about data
layout. What are you actually trying to get out of SOLR?

Are you saying you want parent's details repeated in every entry? Are
you saying that you want to be able to find entries and from there,
being able to find specific parent.

Whatever you do, SOLR will return you a list of flat entries plus some
statistics on occurrences and facets. Given that, what would you like
to see?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Aug 1, 2012 at 12:33 PM, Thomas Gravel  wrote:
> Thanks for the answer.
>
> Ich have to explain, where the problem is...
>
> you may have at the shop solutions products and articles.
> The product is the parent of all articles...
>
> in json...
>
> {
>"product_name": "tank top",
>"article_list": [
>  {
>   "color": "red",
>   "price": 10.99,
>   "size": "XL",
>   "inStore": true
>  },
>  {
>   "color": "blue",
>   "price": 15.99,
>   "size": "XL",
>   "inStore": false
>  }
>]
> }
>
> the problem is not the search (i think, because you can use
> copyField), but the searchresults...
>
> I have read the possibility to create own FieldTypes, but I don't know
> if this is the answer of my issues...
>
> 2012/8/1 Jack Krupansky :
>> The general rule is to flatten the structures. You have a choice between
>> sharing common fields between tables, such as "title", or adding a
>> prefix/suffix to qualify them, such as "document_title" vs. "product_title".
>>
>> You also have the choice of storing different tables in separate Solr
>> cores/collections, but then you have the burden of querying them separately
>> and coordinating the separate results on your own. It all depends on your
>> application.
>>
>> A lot hinges on:
>>
>> 1. How do you want to search the data?
>> 2. How do you want to access the fields once the Solr documents have been
>> identified by a query - such as fields to retrieve, "join", etc.
>>
>> So, once the data is indexed, what are your requirements for accessing the
>> data? E.g., some sample pseudo-queries and the fields you want to access.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Thomas Gravel
>> Sent: Wednesday, August 01, 2012 9:52 AM
>> To: solr-user@lucene.apache.org
>> Subject: Map Complex Datastructure with Solr
>>
>>
>> Hi,
>> how can I map these complex Datastructure in Solr?
>>
>> Document
>>- Groups
>> - Group_ID
>> - Group_Name
>> - .
>>   - Title
>>   - Chapter
>> - Chapter_Title
>> - Chapter_Content
>>
>>
>> Or
>>
>> Product
>>- Groups
>> - Group_ID
>> - Group_Name
>> - .
>>   - Title
>>   - Articles
>> - Artilce_ID
>> - Artilce_Color
>> - Artilce_Size
>>
>> Thanks for ideas


Re: Map Complex Datastructure with Solr

2012-08-01 Thread Thomas Gravel
hm ok I think i have to write my example data and the queries I want
to make + the response I expect...

Data:

{
"product_id": "xyz76",
"product_name": "tank top",
"brand": "adidas",
"description":"this is the long description of the product",
"short_description":"this is the short description of the product",
"product_image":"/images/tanktop.jpg",
"product_image":"/images/tanktop2.jpg",
"article_list": [
{
"article_number": "TR47",
"color": "red",
"price": 10.99,
"size": "XL",
"unit": "1 piece",
"inStore": true
},
{
"article_number": "TR48",
"color": "blue",
"price": 15.99,
"size": "XL",
"unit": "1 piece",
"inStore": false
}
]
}

I want to search:
- article_number (i.e with inStore = true)
- color
- description
- short_description
- product_name

Facets:
- brand
- color
- size
- price

example query-response
{
  "responseHeader":{
"status":0,
"QTime":2,
"params":{
  "indent":"on",
  "start":"0",
  "q":"IBProductName:Durch*",
  "wt":"json",
  "version":"2.2",
  "rows":"10"}},
  "response":{"numFound":1,"start":0,"docs":[
  {
"product_id": "xyz76",
"product_name": "tank top",
"brand": "adidas",
"description":"this is the long description of the product",
"short_description":"this is the short description of the product",
"product_image":"/images/tanktop.jpg",
"product_image":"/images/tanktop2.jpg",
"article_list": [
{
"color": "red",
"price": 10.99,
"size": "XL",
"unit": "1 piece",
"inStore": true
},
{
"color": "blue",
"price": 15.99,
"size": "XL",
"unit": "1 piece",
"inStore": false
}
]

}
]
  }}


2012/8/1 Alexandre Rafalovitch :
> Sorry, that did not explain the problem, just more info about data
> layout. What are you actually trying to get out of SOLR?
>
> Are you saying you want parent's details repeated in every entry? Are
> you saying that you want to be able to find entries and from there,
> being able to find specific parent.
>
> Whatever you do, SOLR will return you a list of flat entries plus some
> statistics on occurrences and facets. Given that, what would you like
> to see?
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Wed, Aug 1, 2012 at 12:33 PM, Thomas Gravel  
> wrote:
>> Thanks for the answer.
>>
>> Ich have to explain, where the problem is...
>>
>> you may have at the shop solutions products and articles.
>> The product is the parent of all articles...
>>
>> in json...
>>
>> {
>>"product_name": "tank top",
>>"article_list": [
>>  {
>>   "color": "red",
>>   "price": 10.99,
>>   "size": "XL",
>>   "inStore": true
>>  },
>>  {
>>   "color": "blue",
>>   "price": 15.99,
>>   "size": "XL",
>>   "inStore": false
>>  }
>>]
>> }
>>
>> the problem is not the search (i think, because you can use
>> copyField), but the searchresults...
>>
>> I have read the possibility to create own FieldTypes, but I don't know
>> if this is the answer of my issues...
>>
>> 2012/8/1 Jack Krupansky :
>>> The general rule is to flatten the structures. You have a choice between
>>> sharing common fields between tables, such as "title", or adding a
>>> prefix/suffix to qualify them, such as "document_title" vs. "product_title".
>>>
>>> You also have the choice of storing different tables in separate Solr
>>> cores/collections, but then you have the burden of querying them separately
>>> and coordinating the separate results on your own. It all depends on your
>>> application.
>>>
>>> A lot hinges on:
>>>
>>> 1. How do you want to search the data?
>>> 2. How do you want to access the fields once the Solr documents have been
>>> identified by a query - such as fields to retrieve, "join", etc.
>>>
>>> So, once the data is indexed, what are your requirements for accessing the
>>> data? E.g., some sample pseudo-queries and the fields you want to access.
>>>
>>> -- Jack Krupansky
>>>
>>> 

Exact match on few fields, fuzzy on others

2012-08-01 Thread Pranav Prakash
Hi Folks,

I am using Solr 3.4 and my document schema has attributes - title,
transcript, author_name. Presently, I am using DisMax to search for a user
query across transcript. I would also like to do an exact search on
author_name so that for a query "Albert Einstein", I would want to get all
the documents which contain Albert or Einstein in transcript and also those
documents which have author_name exactly as 'Albert Einstein'.

Can we do this by dismax query parser? The schema for both the fields are
below:

 

  
  
  
  
  
  




  



  
  
  
  
  
  
  
  
  
  
  

 
 


--
*Pranav Prakash*

"temet nosce"


4.0 Strange Commit/Replication Issue

2012-08-01 Thread Briggs Thompson
Hello all,

I am running 4.0 alpha and have encountered something I am unable to
explain. I am indexing content to a master server, and the data is
replicating to a slave. The odd part is that when searching through the UI,
no documents show up on master with a standard *:* query. All cache types
are set to zero. I know indexing is working because I am watching the logs
and I can see documents getting added, not to mention the data is written
to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a
commit issue.

The very strange part is that the slave is correctly replicating the data,
and it is searchable in the UI on the slave (but not master). I don't
understand how/why the data is visible on the slave and not visible on the
master. Does anyone have any thoughts on this or seen it before?

Thanks in advance!
Briggs


Solr spellcheck for words with quotes

2012-08-01 Thread Shri Kanish
Hi ,
I use solr as search engine for our application. WE have a title "Pandora's 
star". When I give a query as 
http://localhost:8983/solr/select?q=pandora's star&spellcheck=true 
&spellcheck.collate=true
 
I get response as below,

- 


- 


- 


  1 

  10 

  17 

- 


  pandora's 
  
  

  text_engb:pandora's's star 
  
  
 
The word goes as pandora and not as pandora's. An additional  's is appended to 
the collation result. Below is my configuraion for spellcheck
 





 
 



  
 

 




 
Please suggest
 
Thanks,
Shri

Re: 4.0 Strange Commit/Replication Issue

2012-08-01 Thread Tomás Fernández Löbbe
Could your autocommit in the master be using "openSearcher=false"? If you
go to the Master admin, do you see that the searcher has all the segments
that you see in the filesystem?



On Wed, Aug 1, 2012 at 4:24 PM, Briggs Thompson  wrote:

> Hello all,
>
> I am running 4.0 alpha and have encountered something I am unable to
> explain. I am indexing content to a master server, and the data is
> replicating to a slave. The odd part is that when searching through the UI,
> no documents show up on master with a standard *:* query. All cache types
> are set to zero. I know indexing is working because I am watching the logs
> and I can see documents getting added, not to mention the data is written
> to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a
> commit issue.
>
> The very strange part is that the slave is correctly replicating the data,
> and it is searchable in the UI on the slave (but not master). I don't
> understand how/why the data is visible on the slave and not visible on the
> master. Does anyone have any thoughts on this or seen it before?
>
> Thanks in advance!
> Briggs
>


Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread raonalluri
I noticed, escape character which is in the query, is getting ignored in solr
3.6.

For the following 3.3 gives results where 'Featuring Chimp' is matched. But
in 3.6, it gives results where Featuring or Chimp or Featuring Chimp is
matched. Any idea what is the difference between my 3.3 and 3.6 environments
for this inconsistent results?

/select/?q=title:Featuring\ Chimp



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998665.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: 4.0 Strange Commit/Replication Issue

2012-08-01 Thread Briggs Thompson
That is the problem. I wasn't aware of that new feature in 4.0. Thanks for
the quick response Tomás.

-Briggs

On Wed, Aug 1, 2012 at 3:08 PM, Tomás Fernández Löbbe  wrote:

> Could your autocommit in the master be using "openSearcher=false"? If you
> go to the Master admin, do you see that the searcher has all the segments
> that you see in the filesystem?
>
>
>
> On Wed, Aug 1, 2012 at 4:24 PM, Briggs Thompson <
> w.briggs.thomp...@gmail.com
> > wrote:
>
> > Hello all,
> >
> > I am running 4.0 alpha and have encountered something I am unable to
> > explain. I am indexing content to a master server, and the data is
> > replicating to a slave. The odd part is that when searching through the
> UI,
> > no documents show up on master with a standard *:* query. All cache types
> > are set to zero. I know indexing is working because I am watching the
> logs
> > and I can see documents getting added, not to mention the data is written
> > to the filesystem. I have autocommit set to 6 (1 minute) so it isn't
> a
> > commit issue.
> >
> > The very strange part is that the slave is correctly replicating the
> data,
> > and it is searchable in the UI on the slave (but not master). I don't
> > understand how/why the data is visible on the slave and not visible on
> the
> > master. Does anyone have any thoughts on this or seen it before?
> >
> > Thanks in advance!
> > Briggs
> >
>


Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread Jack Krupansky

Which query parser do you have set in your request handler?

There was a problem with edismax in 3.6 with the WordDelimiterFilter, that 
sounds exactly like your symptom. The workaround is to enclose the term in 
quotes (to make it a phrase), otherwise the terms would be "OR"ed rather 
than "AND"ed.


-- Jack Krupansky

-Original Message- 
From: raonalluri

Sent: Wednesday, August 01, 2012 3:25 PM
To: solr-user@lucene.apache.org
Subject: Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

I noticed, escape character which is in the query, is getting ignored in 
solr

3.6.

For the following 3.3 gives results where 'Featuring Chimp' is matched. But
in 3.6, it gives results where Featuring or Chimp or Featuring Chimp is
matched. Any idea what is the difference between my 3.3 and 3.6 environments
for this inconsistent results?

/select/?q=title:Featuring\ Chimp



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998665.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread raonalluri
Jack, thanks a lot for your reply. We are using LuceneQParser query parser. I
agree, if I phrase the string by adding double quotes, I am good. 

But I am checking if there is any fix for this without changing the query.
As we are in production environment, we need to change the quries in
different places.

Can we escape from this issue by change the query parser?

regards
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998677.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread Jack Krupansky
This may simply be a matter of changing the default query operator from "OR" 
to "AND". Try adding &q.op=AND to your request.


-- Jack Krupansky

-Original Message- 
From: raonalluri

Sent: Wednesday, August 01, 2012 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

Jack, thanks a lot for your reply. We are using LuceneQParser query parser. 
I

agree, if I phrase the string by adding double quotes, I am good.

But I am checking if there is any fix for this without changing the query.
As we are in production environment, we need to change the quries in
different places.

Can we escape from this issue by change the query parser?

regards
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998677.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Exact match on few fields, fuzzy on others

2012-08-01 Thread Jack Krupansky
Try edismax with the PF2 option, which will automatically boost documents 
that contains occurrences of adjacent terms as you have suggested.


See:
http://wiki.apache.org/solr/ExtendedDisMax

-- Jack Krupansky

-Original Message- 
From: Pranav Prakash

Sent: Wednesday, August 01, 2012 1:21 PM
To: solr-user@lucene.apache.org
Subject: Exact match on few fields, fuzzy on others

Hi Folks,

I am using Solr 3.4 and my document schema has attributes - title,
transcript, author_name. Presently, I am using DisMax to search for a user
query across transcript. I would also like to do an exact search on
author_name so that for a query "Albert Einstein", I would want to get all
the documents which contain Albert or Einstein in transcript and also those
documents which have author_name exactly as 'Albert Einstein'.

Can we do this by dismax query parser? The schema for both the fields are
below:


   
 
 
 
 
 
 
   
   
   
   
 


   
 
 
 
 
 
 
 
 
 
 
 





--
*Pranav Prakash*

"temet nosce" 



Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread roz dev
Thanks Robert for these inputs.

Since we do not really Snowball analyzer for this field, we would not use
it for now. If this still does not address our issue, we would tweak thread
pool as per eks dev suggestion - I am bit hesitant to do this change yet as
we would be reducing thread pool which can adversely impact our throughput

If Snowball Filter is being optimized for Solr 4 beta then it would be
great for us. If you have already filed a JIRA for this then please let me
know and I would like to follow it

Thanks again
Saroj





On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir  wrote:

> On Tue, Jul 31, 2012 at 2:34 PM, roz dev  wrote:
> > Hi All
> >
> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> that
> > when we are indexing lots of data with 16 concurrent threads, Heap grows
> > continuously. It remains high and ultimately most of the stuff ends up
> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> > getting into excessive GC problem.
>
> Hi: I don't claim to know anything about how tomcat manages threads,
> but really you shouldnt have all these objects.
>
> In general snowball stemmers should be reused per-thread-per-field.
> But if you have a lot of fields*threads, especially if there really is
> high thread churn on tomcat, then this could be bad with snowball:
> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841
>
> I think it would be useful to see if you can tune tomcat's threadpool
> as he describes.
>
> separately: Snowball stemmers are currently really ram-expensive for
> stupid reasons.
> each one creates a ton of Among objects, e.g. an EnglishStemmer today
> is about 8KB.
>
> I'll regenerate these and open a JIRA issue: as the snowball code
> generator in their svn was improved
> recently and each one now takes about 64 bytes instead (the Among's
> are static and reused).
>
> Still this wont really "solve your problem", because the analysis
> chain could have other heavy parts
> in initialization, but it seems good to fix.
>
> As a workaround until then you can also just use the "good old
> PorterStemmer" (PorterStemFilterFactory in solr).
> Its not exactly the same as using Snowball(English) but its pretty
> close and also much faster.
>
> --
> lucidimagination.com
>


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread Simon Willnauer
On Thu, Aug 2, 2012 at 7:53 AM, roz dev  wrote:
> Thanks Robert for these inputs.
>
> Since we do not really Snowball analyzer for this field, we would not use
> it for now. If this still does not address our issue, we would tweak thread
> pool as per eks dev suggestion - I am bit hesitant to do this change yet as
> we would be reducing thread pool which can adversely impact our throughput
>
> If Snowball Filter is being optimized for Solr 4 beta then it would be
> great for us. If you have already filed a JIRA for this then please let me
> know and I would like to follow it

AFAIK Robert already created and issue here:
https://issues.apache.org/jira/browse/LUCENE-4279
and it seems fixed. Given the massive commit last night its already
committed and backported so it will be in 4.0-BETA.

simon
>
> Thanks again
> Saroj
>
>
>
>
>
> On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir  wrote:
>
>> On Tue, Jul 31, 2012 at 2:34 PM, roz dev  wrote:
>> > Hi All
>> >
>> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
>> that
>> > when we are indexing lots of data with 16 concurrent threads, Heap grows
>> > continuously. It remains high and ultimately most of the stuff ends up
>> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
>> > getting into excessive GC problem.
>>
>> Hi: I don't claim to know anything about how tomcat manages threads,
>> but really you shouldnt have all these objects.
>>
>> In general snowball stemmers should be reused per-thread-per-field.
>> But if you have a lot of fields*threads, especially if there really is
>> high thread churn on tomcat, then this could be bad with snowball:
>> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841
>>
>> I think it would be useful to see if you can tune tomcat's threadpool
>> as he describes.
>>
>> separately: Snowball stemmers are currently really ram-expensive for
>> stupid reasons.
>> each one creates a ton of Among objects, e.g. an EnglishStemmer today
>> is about 8KB.
>>
>> I'll regenerate these and open a JIRA issue: as the snowball code
>> generator in their svn was improved
>> recently and each one now takes about 64 bytes instead (the Among's
>> are static and reused).
>>
>> Still this wont really "solve your problem", because the analysis
>> chain could have other heavy parts
>> in initialization, but it seems good to fix.
>>
>> As a workaround until then you can also just use the "good old
>> PorterStemmer" (PorterStemFilterFactory in solr).
>> Its not exactly the same as using Snowball(English) but its pretty
>> close and also much faster.
>>
>> --
>> lucidimagination.com
>>