Parameter Dereferencing with function queries in solr json facet

2018-12-08 Thread Venu
I am using solr6.6. I am trying to use parameter Dereferencing for json
facets.

There can be multiple prices for all the documents(assume sp, price1,
price2, price3, price4 are the prices for all the documents).

Based on the query, I have to fetch the minimum value among the combination
of those prices and create facets among them.

It can be done with frange. But is there a way that this can be done with
json facets?

Below sample query works:

localhost:8983/solrcloud/myconf/select?q=apple&
json.facet={
'prices':{
'type': 'query',
'q': "${FIELD}:[0 TO 20]"
}
}
&FIELD=sp

But when I try to use something like this:

localhost:8983/solrcloud/myconf/select?q=apple&
json.facet={
'prices':{
'type': 'query',
'q': "${FIELD}:[0 TO 20]"
}
}&FIELD=min(price1, price2, price3, sp)

My other query can be like as below:

localhost:8983/solrcloud/myconf/select?q=apple&
json.facet={
'prices':{
'type': 'query',
'q': "${FIELD}:[0 TO 20]"
}
}&FIELD=min(price1, price3, sp)

or

localhost:8983/solrcloud/myconf/select?q=apple&
json.facet={
'prices':{
'type': 'query',
'q': "${FIELD}:[0 TO 20]"
}
}&FIELD=min(price1, price2)

I can use any kind combination like (price1, price2) or (price4, sp).. etc

I am getting below error:



org.apache.solr.common.SolrException
org.apache.solr.parser.ParseException


org.apache.solr.search.SyntaxError: Cannot parse 'min(price1, price2,
price3):[0 TO 20]': Encountered " ":" ": "" at line 1, column 12. Was
expecting one of:   ...  ...  ... "+" ... "-" ...
 ... "(" ... "*" ... "^" ...  ...  ... 
...  ...  ... "[" ... "{" ...  ... "filter("
...  ...

400

Is it possible to do function query while forming query?



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Parameter Dereferencing with function queries in solr json facet

2018-12-09 Thread Venu
Thanks Mikhail 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr facet order same as result set

2020-04-20 Thread Venu
Hi
For a given query and sort order, Solr returns the results(ordered based on
score and sort order) set along with facets(ordered in descending order of
buckets counts)

Is there any way to get the facets also in the same order as results/docs? I
tried with json facet, but I am not able to make it. 

In the below example, I sorted based on multiple fields, say, 'rank' and
'sales' and the first doc is sku: 123456, but the same is not returned in
the facets, but I want those SKUs to be part of the facets.

*Sample query:*
http://localhost:8983/solr/samplecollection/select?q=((group_id: ("g.0" OR
"g.1")) OR (!group_id: "g.46"))
AND
!(sku: 1000422 OR group_id: g.13) &json.facet={ sku: { type: terms, field:
sku, facet: { fc_ids: { type: terms, field: fc_id } } } }&sort=rank desc,
sales desc&fl=score *&rows=3
*Sample Response:*
"response": {
"numFound": 1998,
"start": 0,
"maxScore": 1.7779741,
"docs": [
{
"sku": "123456",
"group_id": "g.0",
"id": "123456.0",
"fc_id": "0",
"_version_": 1664396609960542200,
"score": 1.7779741
},
{
"sku": "366222",
"group_id": "g.0",
"id": "366222.0",
"fc_id": "0",
"_version_": 1664396609963688000,
"score": 1.7779741
},
{
"sku": "1000425",
"group_id": "g.0",
"id": "1000425.0",
"fc_id": "0",
"_version_": 1664396609964736500,
"score": 1.7779741
}
]
},
"facets": {
"count": 1998,
"sku": {
"buckets": [
{
"val": "1000425",
"count": 2,
"fc_ids": {
"buckets": [
{
"val": "0",
"count": 1
},
{
"val": "1",
"count": 1
}
]
}
},
{
"val": "100253356",
"count": 2,
"fc_ids": {
"buckets": [
{
"val": "0",
"count": 1
},
{
"val": "1",
"count": 1
}
]
}
}




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr facet order same as result set

2020-04-20 Thread Venu
Probably I haven't framed my question properly.

Consider the schema with the fields - id, sku, fc_id, group_id
The same SKU can be part of multiple documents with different fc_id and
group_id.

For a given search query, multiple documents having the same SKU will be
returned. Is there any way I can get all the fc_ids for those SKUs returned
in the result set? Do I have to do a separate query with those SKUs again to
fetch the fc_ids through json facets?

I am fetching the fc_ids through JSON-facets. But the order of those
returned from facets is different from the result set. 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Search on Nested Documents

2020-04-26 Thread Venu
Hi
I have gone through
https://lucene.apache.org/solr/guide/8_5/searching-nested-documents.html and
examples suggest to search on a field either on parent or child based on
parsers.
Is there a way to search on the combined text of parent and child documents? 

Assume below is the document I am indexing to Solr. 
Is there a way I can search as a "blue t-shirt" and retrieve the id:3
document and fetch the remaining siblings?
and similarly, on the search of "brown t-shirt", id:4 document should be
returned with remaining siblings.


{
  id: 1
  product : "Awesome T-Shirt",
  _child_documents:
{
   id: 2
   color : "Red",
   size : "L",
},
   {
  id: 3
  color : "Blue",
  size : "M",
   },
   {
  id: 4
  color : "brown",
  size : "XL",
   }
}




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Normalized score

2020-05-19 Thread Venu
Hi
Is it possible to normalize the per field score before applying the boosts?

let say 2 documents match my search criteria on the query fields *title* and
*description* using Dismax Parser with individual boosts.

q=cookie&qf = text^2 description^1

let's say below are the TF-IDF scores for the documents:
Doc1:
title - 2description - 4
Doc2:
title - 2.5description - 3

Idea is to normalize with the max value before applying the boost. 

Doc1:
title - (2/4) *2 (boost of title)   description - (4/4) * 1(boost of
description)
Doc2:
title - (2.5/4) * 2   description - (3/4) * 1 

Is this possible? Or Do I need to do re-ranking/LTR here?

Can you guys please suggest if this is doable?




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr multi word search across multiple fields with mm

2020-07-07 Thread Venu
Hi 
We observed that the multi-word queries spanned across multiple fields with
mm create a problem. Any help would be appreciated.

Current Problem:
Search on words spanning across different fields with minimum match(mm) and
sow=false generates field centric query with per field mm rather than term
centric query with mm across fields when a field undergoes different
query-time analyses(like multi-word synonyms, stop words, etc)

Below are the sample field and term centric queries:

*term centric query with the query string as "amul cheese slice" (none of
the terms has synonyms):*

"parsedquery_toString": "+description:amul)^6.0 | description_l2:amul |
(description_l1:amul)^4.0 | (brand_name_h:amul)^8.0 |
(manual_tags:amul)^3.0) ((description:cheese)^6.0 | description_l2:cheese |
(description_l1:cheese)^4.0 | (brand_name_h:cheese)^8.0 |
(manual_tags:cheese)^3.0) ((description:slice)^6.0 | description_l2:slice |
(description_l1:slice)^4.0 | (brand_name_h:slice)^8.0 |
(manual_tags:slice)^3.0))~2)",

*field centric query with the query string as "amul cheese cake" (cake has a
synonym of plum cake):*

"parsedquery_toString": "+(((description:amul description:cheese
description:cake)~2)^6.0 | ((description_l2:amul description_l2:cheese
(description_l2:cupcak description_l2:pastri (+description_l2:plum
+description_l2:cake) description_l2:cake))~2) | ((description_l1:amul
description_l1:cheese description_l1:cake)~2)^4.0 | ((brand_name_h:amul
brand_name_h:cheese brand_name_h:cake)~2)^8.0 | ((manual_tags:amul
manual_tags:cheese manual_tags:cake)~2)^3.0)",


Referring to multiple blogs below helped us try different things below
1. autogeneratephrase queries 
2. per field mm q=({!edismax qf=brand_name description v=$qx mm=2}^10 OR
{!edismax qf=description_l1 manual_tags_l1 v=$qx mm=2} OR {!edismax
qf=description_l2 v=$qx mm=2} )&qx=amul cheese cake

But we observed that the above are still being converted to field centric
queries with mm per field resulting in no match if the words span across
multiple fields.





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr multi word search across multiple fields with mm

2020-07-08 Thread Venu
Search on words spanning across different fields using edismax query parser
with minimum match(mm) and sow=false generates different queries when a
field undergoes different query-time analyses(like multi-word synonyms, stop
words, etc)

Assuming I have 2 documents where brand, description_synonyms and tags have
different data
{id: 1
  brand: amul,
  description_synonyms: slice,
  tags: cheese
}
{id:2,
  brand: amul,
  description_synonyms:cake,
  tags: cheese
}


Below is a parsed query strings for the query "amul cheese slice". In this
case, *mm(~2) is across fields* since none of amul, cheese, slice have
synonyms

"parsedquery_toString": "+brand:amul)^10.0 |
(description_synonyms:amul)^4.0 | tags:amul)~1.0 ((brand:cheese)^10.0 |
(description_synonyms:cheese)^4.0 | tags:cheese)~1.0 ((brand:slice)^10.0 |
(description_synonyms:slice)^4.0 | tags:slice)~1.0)*~2*)"

while below is a parsed string for "amul cheese cake". Since cake has plum
cake etc as synonyms, edismax produced below query with *mm(~2) on per
field* resulting in no match.

"parsedquery_toString": "+(((brand:amul brand:cheese brand:cake)~2)^10.0 |
((description_synonyms:amul description_synonyms:cheese
(description_synonyms:cupcak description_synonyms:pastri
description_synonyms:\"plum cake\" description_synonyms:cake))~2)^4.0 |
((tags:amul tags:cheese tags:cake)~2))~1.0"

I want to match on individual fields rather than clubbing all fields into a
single field. 

Is there a way we can solve this? Any help would be highly appreciated.






--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr multi word search across multiple fields with mm

2020-07-12 Thread Venu
After some research came across below article 
1.  edismax-and-multiterm-synonyms-oddities/

 
. 
2.  apache mail archive

  
3.  apache mail archive2

  

Looks like this is already an existing problem. 

The only way I see is clubbing all the required fields into a single field
and do an mm on that field. 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr CPU spiking up on bulk indexing

2019-06-15 Thread Venu
Hi

While doing a batch indexing, Solr CPU is spiking regularly. I am doing the
auto-commit for every 5 minutes.

Please find the image below

 

On stopping the indexing, the CPU is coming to the normal state (around
20%). 
In the image above, CPU is in a normal state between the peaks (around 20%)

Please find the below logs:

null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 5/5 ms
null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 5/5 ms
at
org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:219)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:220)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:583)
at
org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55)
at
org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54)
at java.io.OutputStream.write(OutputStream.java:116)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:140)
at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:154)
at
org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:93)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:73)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
5/5 ms
at 
org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.Futur

Solr CPU spiking up on bulk indexing

2019-06-15 Thread Venu
Hi

While doing a batch indexing, Solr CPU is spiking regularly. I am doing the
auto-commit for every 5 minutes.

Please find the image below

 

On stopping the indexing, the CPU is coming to the normal state (around
20%). 
In the image above, CPU is in a normal state between the peaks (around 20%)

Please find the below logs:

null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 5/5 ms
null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 5/5 ms
at
org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:219)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:220)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:583)
at
org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55)
at
org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54)
at java.io.OutputStream.write(OutputStream.java:116)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125)
at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:140)
at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:154)
at
org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:93)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:73)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
5/5 ms
at 
org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.Futur

Re: Solr CPU spiking up on bulk indexing

2019-06-18 Thread Venu
Thanks Erick. 

I see the above pattern only at the time of commit.

I have many fields (like around 250 fields out of which around 100 fields
are dynamic fields and around 3 n-gram fields and text fields, while many of
them are stored fields along with indexed fields), will a merge take a lot
of time in this kind of case, I mean is it CPU intensive because of many
dynamic fields or because of huge data?

Also, I am doing a hard commit for every 5 minutes and open-searcher is true
in my case. I am not doing soft-commit.

And below are the configurations for filter, query and document caches.
Should I try reducing initialsize?

 







--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr Design question on spatial search

2012-02-29 Thread Venu Shankar
Hello,

I have a design question for Solr.

I work for an enterprise which has a lot of retail stores (approx. 20K).
These retail stores are spread across the world.  My search requirement is
to find all the cities which are within x miles of a retail store.

So lets say if we have a retail Store in San Francisco and if I search for
"San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
returned as they are within x miles from San Francisco. I also want to rank
the search results by their distance.

I can create an index with all the cities in it but I am not sure how do I
ensure that the cities returned in a search result have a nearby retail
store. Any suggestions ?

Thanks,
Venu,


Re: Solr Design question on spatial search

2012-03-02 Thread Venu Dev
So let's say x=10 miles. Now if I search for San then San Francisco, San Mateo 
should be returned because there is a retail store in San Francisco. But San 
Jose should not be returned because it is more than 10 miles away from San 
Francisco. Had there been a retail store in San Jose then it should be also 
returned when you search for San. I can restrict the queries to a country. 

Thanks,
~Venu

On Mar 2, 2012, at 5:57 AM, Erick Erickson  wrote:

> I don't see how this works, since your search for San could also return
> San Marino, Italy. Would you then return all retail stores in
> X miles of that city? What about San Salvador de Jujuy, Argentina?
> 
> And even in your example, San would match San Mateo. But should
> the search then return any stores within X miles of San Mateo?
> You have to stop somewhere
> 
> Is there any other information you have that restricts how far to expand the
> search?
> 
> Best
> Erick
> 
> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev  
> wrote:
>> I don't think Spatial search will fully fit into this. I have 2 approaches 
>> in mind but I am not satisfied with either one of them.
>> 
>> a) Have 2 separate indexes. First one to store the information about all the 
>> cities and second one to store the retail stores information. Whenever user 
>> searches for a city then I return all the matching cities from first index 
>> and then do a spatial search on each of the matched city in the second 
>> index. But this is too costly.
>> 
>> b) Index only the cities which have a nearby store. Do all the 
>> calculation(s) before indexing the data so that the search is fast. The 
>> problem that I see with this approach is that if a new retail store or a 
>> city is added then I would have to re-index all the data again.
>> 
>> 
>> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>> 
>>> I believe that what you need is spatial search...
>>> 
>>> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
>>> 
>>> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar 
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I have a design question for Solr.
>>>> 
>>>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>>>> These retail stores are spread across the world.  My search requirement is
>>>> to find all the cities which are within x miles of a retail store.
>>>> 
>>>> So lets say if we have a retail Store in San Francisco and if I search for
>>>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>>>> returned as they are within x miles from San Francisco. I also want to rank
>>>> the search results by their distance.
>>>> 
>>>> I can create an index with all the cities in it but I am not sure how do I
>>>> ensure that the cities returned in a search result have a nearby retail
>>>> store. Any suggestions ?
>>>> 
>>>> Thanks,
>>>> Venu,
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Dirceu Vieira Júnior
>>> ---
>>> +47 9753 2473
>>> dirceuvjr.blogspot.com
>>> twitter.com/dirceuvjr
>> 


SOLR newbie question: How to filter the results based on my Unique Key

2009-02-28 Thread Venu Mittal
Hi List,

Is it possible to filter out the duplicate results using a particular field in 
the document.
e.g.


 1   
123
 a...@b.com


Now if I search for email = a...@b.com I get 2 search results but I want to 
send just one record cause my cust_id is same. Is it possible or do I need to 
handle it in the calling application.
 
Thanks


  

Re: SOLR newbie question: How to filter the results based on my Unique Key

2009-02-28 Thread Venu Mittal
Hi Stephen,

Thanks for the info. 

I took the latest patch (collapsing-patch-to-1.3.0-dieter.patch) and applied it 
on the source code. Then I took the newly created jar added it to SOLR war. But 
SOLR is still ignoring the new config. I am still getting 2 records in my 
resultset. Is there something that I am missing here ?

TIA.



From: Stephen Weiss 
To: solr-user@lucene.apache.org
Sent: Saturday, February 28, 2009 10:50:26 PM
Subject: Re: SOLR newbie question: How to filter the results based on my Unique 
Key

There's an experimental patch for this I've had pretty good success with:

https://issues.apache.org/jira/browse/SOLR-236

If you don't particularly need faceting support to work 100% it's already 
pretty perfect.  Officially I guess they want it to make it in for version 
1.5??  But in the meantime it's pretty easy to implement and stable, just make 
sure you use the latest patch.

--
Steve

On Feb 28, 2009, at 5:45 PM, Venu Mittal wrote:

> Hi List,
> 
> Is it possible to filter out the duplicate results using a particular field 
> in the document.
> e.g.
> 
> 
> 1
> 123
> a...@b.com
> 
> 
> Now if I search for email = a...@b.com I get 2 search results but I want to 
> send just one record cause my cust_id is same. Is it possible or do I need to 
> handle it in the calling application.
> 
> Thanks
> 
> 


  

Re: SOLR newbie question: How to filter the results based on my Unique Key

2009-02-28 Thread Venu Mittal
Ok so I tried out XSLT transformation on the resulting xml and I must say that 
I am very impressed with the results. I will do some more load testing tomorrow 
and finalize this solution.

Thanks everyone.





From: Venu Mittal 
To: solr-user@lucene.apache.org
Sent: Sunday, March 1, 2009 2:03:19 AM
Subject: Re: SOLR newbie question: How to filter the results based on my Unique 
Key

Hi Stephen,

Thanks for the info. 

I took the latest patch (collapsing-patch-to-1.3.0-dieter.patch) and applied it 
on the source code. Then I took the newly created jar added it to SOLR war. But 
SOLR is still ignoring the new config. I am still getting 2 records in my 
resultset. Is there something that I am missing here ?

TIA.



From: Stephen Weiss 
To: solr-user@lucene.apache.org
Sent: Saturday, February 28, 2009 10:50:26 PM
Subject: Re: SOLR newbie question: How to filter the results based on my Unique 
Key

There's an experimental patch for this I've had pretty good success with:

https://issues.apache.org/jira/browse/SOLR-236

If you don't particularly need faceting support to work 100% it's already 
pretty perfect.  Officially I guess they want it to make it in for version 
1.5??  But in the meantime it's pretty easy to implement and stable, just make 
sure you use the latest patch.

--
Steve

On Feb 28, 2009, at 5:45 PM, Venu Mittal wrote:

> Hi List,
> 
> Is it possible to filter out the duplicate results using a particular field 
> in the document.
> e.g.
> 
> 
> 1
> 123
> a...@b.com
> 
> 
> Now if I search for email = a...@b.com I get 2 search results but I want to 
> send just one record cause my cust_id is same. Is it possible or do I need to 
> handle it in the calling application.
> 
> Thanks
> 
> 


  

Re: How to search the database tables using solr.

2009-03-04 Thread Venu Mittal
Does anybody has any stats to share on how much time does DataImportHandler 
takes to index a given set of data ?

I am currently indexing 18 millions rows in 1.5 - 2 hours by sending xmls to 
solr. 




From: Shalin Shekhar Mangar 
To: solr-user@lucene.apache.org; cra...@ceiindia.com
Sent: Wednesday, March 4, 2009 8:15:07 AM
Subject: Re: How to search the database tables using solr.

On Wed, Mar 4, 2009 at 7:51 PM, Radha C.  wrote:

> Thanks Shalin,
>
> We just stepped on solr. This information is very much useful for me. But
> before that I want some clear details about where to start..
> I want to test this in my local environment, so I need some basic
> information about how to start using this ( database and solr ). Do you
> have
> some information on this?
>

I think the easiest way is to start using Solr is with the embedded jetty
container. Modify the example/conf/schema.xml file and add your own fields
etc. Read through the DataImportHandler wiki page and at the
example/example-DIH directory in the solr zip/tarball.

If you have a specific doubt/question, ask on the list.

-- 
Regards,
Shalin Shekhar Mangar.



  

Highlighting the searched term in resultset

2009-03-12 Thread Venu Mittal
I was wondering if there is any way of highlighting the searched term in the 
resultset directly instead of having it as a separate "lst" element. 
Doing it through xsl transformation would be one way. 
Has anybody implemented any other better solution ?

e.g

 
  iPhone
  iphone sell buy/str>
  2007-11-20T05:36:29Z
  2007-11-17T06:00:00Z
  ARTICLE
 



TIA.



  

Re: Date Search with q query parameter

2009-03-12 Thread Venu Mittal
Is your final query in this format ?

col1:[2009-01-01T00:00:00Z+TO+2009-01-01T23:59:59Z]




From: dabboo 
To: solr-user@lucene.apache.org
Sent: Thursday, March 12, 2009 12:27:48 AM
Subject: Date Search with q query parameter


Hi,

I am facing an issue with the date field, I have in my records.

e.g. I am using q query parameter and passing some string as search criteria
like "test". While creating query with q parameter, how query forms is:

column1:test | column2:test | column3:test . ...

I have one column as date column, which is appended with _dt like
column4_dt. Now, when it creates the query like 

column1:test | column2:test | column3:test | column4_dt:test 

Here it throws an exception saying "Invalid date format".

Please suggest how I can prevent this.

Thanks,
Amit Garg

-- 
View this message in context: 
http://www.nabble.com/Date-Search-with-q-query-parameter-tp22471072p22471072.html
Sent from the Solr - User mailing list archive at Nabble.com.


  

ExtractingRequestHandler Question

2009-04-03 Thread Venu Mittal
Hi,

I am using ExtractingRequestHandler to index  rich text documents. 
The way I am doing it is I get some data related to the document from database 
and then post an xml  (containing only this data ) to solr. Then I make another 
call to solr, which sends the actual document to be indexed. 
But while doing so I am loosing out all the other data that is related to the 
document. 

Is this the right way to do handle it or am I missing out on something.

TIA



  

Re: ExtractingRequestHandler Question

2009-04-06 Thread Venu Mittal
Hi Jacob,

Thanks for the reply. I am still trying to nail down this problem with the best 
possible solution.
Yeah I had thought about these 2 approaches but both of them are gonna make my 
indexing slower.  Plus the fact that I will have atleast 5 rich text files 
associated with each document is not helping much either.

Anyways I will explore and see if I can come up with anything better (may be a 
separate index for rich text docs).

Thanks,
Venu




From: Jacob Singh 
To: solr-user@lucene.apache.org
Sent: Saturday, April 4, 2009 9:59:13 PM
Subject: Re: ExtractingRequestHandler Question

Hi TIA,

I have the same desired requirement.  If you look up in the archives,
you might find a similar thread between myself and the always super
helpful Erik Hatcher.  Basically, it can't be done (right now).

You can however use the "ExtractOnly" request handler, and just get
the extracted text back from solr, and then use xpath to get out the
attributes and then add them to your XML you are sending.

Not ideal because the file has to be transfered twice.

The only other option is to send the file as per the instructions via
POST with its attributes as POST fields.

Keep in mind that Solr documents are immutable, which means they
cannot change.  When you update a document with the same primary key,
it will simply delete the existing one and add the new one.

hth,
Jacob

On Sat, Apr 4, 2009 at 5:59 AM, Venu Mittal  wrote:
> Hi,
>
> I am using ExtractingRequestHandler to index  rich text documents.
> The way I am doing it is I get some data related to the document from 
> database and then post an xml  (containing only this data ) to solr. Then I 
> make another call to solr, which sends the actual document to be indexed.
> But while doing so I am loosing out all the other data that is related to the 
> document.
>
> Is this the right way to do handle it or am I missing out on something.
>
> TIA
>
>
>
>



-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com



  

Re: ExtractingRequestHandler Question

2009-05-10 Thread Venu Mittal
Hi,

Wondering if somebody could help me in understanding the following behavior :-

If I search on a text field with search query as "davi cla" then it does not 
yields any search results however if I search for "davi clai" then it yields me 
100+ results.

The field I am searching on is a text field and has following defination in my 
solr config.

   
  



 



  


Thanks in advance !

Venu



  

Re: Solr Design question on spatial search

2012-03-01 Thread Venu Gmail Dev
I don't think Spatial search will fully fit into this. I have 2 approaches in 
mind but I am not satisfied with either one of them.

a) Have 2 separate indexes. First one to store the information about all the 
cities and second one to store the retail stores information. Whenever user 
searches for a city then I return all the matching cities from first index and 
then do a spatial search on each of the matched city in the second index. But 
this is too costly.

b) Index only the cities which have a nearby store. Do all the calculation(s) 
before indexing the data so that the search is fast. The problem that I see 
with this approach is that if a new retail store or a city is added then I 
would have to re-index all the data again.


On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:

> I believe that what you need is spatial search...
> 
> Have a look a the documention:  http://wiki.apache.org/solr/SpatialSearch
> 
> On Wed, Feb 29, 2012 at 10:54 PM, Venu Shankar 
> wrote:
> 
>> Hello,
>> 
>> I have a design question for Solr.
>> 
>> I work for an enterprise which has a lot of retail stores (approx. 20K).
>> These retail stores are spread across the world.  My search requirement is
>> to find all the cities which are within x miles of a retail store.
>> 
>> So lets say if we have a retail Store in San Francisco and if I search for
>> "San" then San Francisco, Santa Clara, San Jose, San Juan, etc  should be
>> returned as they are within x miles from San Francisco. I also want to rank
>> the search results by their distance.
>> 
>> I can create an index with all the cities in it but I am not sure how do I
>> ensure that the cities returned in a search result have a nearby retail
>> store. Any suggestions ?
>> 
>> Thanks,
>> Venu,
>> 
> 
> 
> 
> -- 
> Dirceu Vieira Júnior
> ---
> +47 9753 2473
> dirceuvjr.blogspot.com
> twitter.com/dirceuvjr



Re: Solr Design question on spatial search

2012-03-02 Thread Venu Gmail Dev
Sorry for not being clear enough.

I don't know the point of origin. All I know is that there are 20K retail 
stores. Only the cities within 10 miles radius of these stores should be 
searchable. Any city which is outside these small 10miles circles around these 
20K stores should be ignored.

So when somebody searches for a city, I need to query the cities which are in 
these 20K 10miles circles but I don't know which 10-mile circle I should query.

So the approach that I was thinking were :-

>>>> a) Have 2 separate indexes. First one to store the information about all 
>>>> the cities and second one to store the retail stores information. Whenever 
>>>> user searches for a city then I return all the matching cities ( and hence 
>>>> the lat-long) from first index and then do a spatial search on each of the 
>>>> matched city in the second index. But this is too costly.
>>>> 
>>>> b) Index only the cities which have a nearby store. Do all the 
>>>> calculation(s) before indexing the data so that the search is fast. The 
>>>> problem that I see with this approach is that if a new retail store or a 
>>>> city is added then I would have to re-index all the data again.

Does this answers the problem that you posed ?

Thanks,
Venu.

On Mar 2, 2012, at 9:52 PM, Erick Erickson wrote:

> But again, that doesn't answer the problem I posed. Where is your
> point of origin?
> There's nothing in what you've written that indicates how you would know
> that 10 miles is relative to San Francisco. All you've said is that
> you're searching
> on "San". Which would presumably return San Francisco, San Mateo, San Jose.
> 
> Then, also presumably, you're looking for all the cities with stores
> within 10 miles
> of one of these cities. But nothing in your criteria so far says that
> that city is
> San Francisco.
> 
> If you already know that San Francisco is the locus, simple distance
> will work just
> fine. You can index both city and store info in the same index and
> restrict, say, facets
> (or, indeed search results) by fq clause (e.g. fq=type:city or fq=type:store).
> 
> Or I'm completely missing the boat here.
> 
> Best
> Erick
> 
> 
> On Fri, Mar 2, 2012 at 11:50 AM, Venu Dev  wrote:
>> So let's say x=10 miles. Now if I search for San then San Francisco, San 
>> Mateo should be returned because there is a retail store in San Francisco. 
>> But San Jose should not be returned because it is more than 10 miles away 
>> from San
>> Francisco. Had there been a retail store in San Jose then it should be also 
>> returned when you search for San. I can restrict the queries to a country.
>> 
>> Thanks,
>> ~Venu
>> 
>> On Mar 2, 2012, at 5:57 AM, Erick Erickson  wrote:
>> 
>>> I don't see how this works, since your search for San could also return
>>> San Marino, Italy. Would you then return all retail stores in
>>> X miles of that city? What about San Salvador de Jujuy, Argentina?
>>> 
>>> And even in your example, San would match San Mateo. But should
>>> the search then return any stores within X miles of San Mateo?
>>> You have to stop somewhere
>>> 
>>> Is there any other information you have that restricts how far to expand the
>>> search?
>>> 
>>> Best
>>> Erick
>>> 
>>> On Thu, Mar 1, 2012 at 4:57 PM, Venu Gmail Dev  
>>> wrote:
>>>> I don't think Spatial search will fully fit into this. I have 2 approaches 
>>>> in mind but I am not satisfied with either one of them.
>>>> 
>>>> a) Have 2 separate indexes. First one to store the information about all 
>>>> the cities and second one to store the retail stores information. Whenever 
>>>> user searches for a city then I return all the matching cities from first 
>>>> index and then do a spatial search on each of the matched city in the 
>>>> second index. But this is too costly.
>>>> 
>>>> b) Index only the cities which have a nearby store. Do all the 
>>>> calculation(s) before indexing the data so that the search is fast. The 
>>>> problem that I see with this approach is that if a new retail store or a 
>>>> city is added then I would have to re-index all the data again.
>>>> 
>>>> 
>>>> On Mar 1, 2012, at 7:59 AM, Dirceu Vieira wrote:
>>>> 
>>>>> I believe that what you need is spatial search...
>>>>> 
>>>>> Have a look a the documention:  h