Solr suggester query with quotes produces different results
Hi guys, I have the Suggester configured using the FreeTextFactory. Noticed that if I dont use quotation marks, I only get single term results. If i use quotation marks around my query, then I only get results that are comprised of multiple terms. There is no configuration that would return both types of results with a single query. Thanks Angel
Re: Allow Join over two sharded collection
Hi Susheel, Currently we have around 20M documents already and we are expecting now on that every month 1M of documents. The reason why don't want to for time based implicit routing is that, all documents will end up with recent shard and so indexing will be heavy for the new shard, where as older shards will be used just for query purpose. If we have default sharding, then load for indexing is distributed across all the shards. That's the reason we would like to stick to default sharding. But Join is the issue over here when default sharding is used :-( -- View this message in context: http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Allow Join over two sharded collection
1M docs/month shouldn't make Solr break a sweat. If it really worries you and you're indexing in a big batch, index during off hours. At very worst, if you're ingesting them all at once you might have to throttle the indexing a bit. Frankly, most of the time acquiring the documents from the system of record is where the bottleneck is and Solr easily handles the indexing load. The other advantage is that if you use implicit routing rather than a composite ID, you can add shards to your collection one at a time as required, for time-series data that's an elegant way to "age out" old documents. Best, Erick On Sat, Jul 1, 2017 at 8:57 AM, mganeshs wrote: > Hi Susheel, > > Currently we have around 20M documents already and we are expecting now on > that every month 1M of documents. > The reason why don't want to for time based implicit routing is that, all > documents will end up with recent shard and so indexing will be heavy for > the new shard, where as older shards will be used just for query purpose. > If we have default sharding, then load for indexing is distributed across > all the shards. That's the reason we would like to stick to default > sharding. But Join is the issue over here when default sharding is used :-( > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Allow Join over two sharded collection
As Eric said 1docs/month isn't a big deal. I have 45+ million docs in one shard but YMMV depending on other factors. Also there is lot of confusion in the terminology. The default routing is compositeID routing. The implicit routing which Eric mentioned is the manual routing. https://issues.apache.org/jira/browse/SOLR-6630 Which routing you are suggesting to use? Can you clarify again. Also what's your exact use case. Do you query old aged documents or you don't need to and most or all of your queries are supposed to go to shard with newer documents. Thanks, Susheel On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson wrote: > 1M docs/month shouldn't make Solr break a sweat. If it really worries > you and you're indexing in a big batch, index during off hours. At > very worst, if you're ingesting them all at once you might have to > throttle the indexing a bit. > > Frankly, most of the time acquiring the documents from the system of > record is where the bottleneck is and Solr easily handles the indexing > load. > > The other advantage is that if you use implicit routing rather than a > composite ID, you can add shards to your collection one at a time as > required, for time-series data that's an elegant way to "age out" old > documents. > > Best, > Erick > > On Sat, Jul 1, 2017 at 8:57 AM, mganeshs wrote: > > Hi Susheel, > > > > Currently we have around 20M documents already and we are expecting now > on > > that every month 1M of documents. > > The reason why don't want to for time based implicit routing is that, all > > documents will end up with recent shard and so indexing will be heavy for > > the new shard, where as older shards will be used just for query purpose. > > If we have default sharding, then load for indexing is distributed across > > all the shards. That's the reason we would like to stick to default > > sharding. But Join is the issue over here when default sharding is used > :-( > > > > > > > > -- > > View this message in context: http://lucene.472066.n3. > nabble.com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html > > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Allow Join over two sharded collection
Depending on your use case people also use collection aliasing for time series data. See below https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/ On Sat, Jul 1, 2017 at 7:13 PM, Susheel Kumar wrote: > As Eric said 1docs/month isn't a big deal. I have 45+ million docs in one > shard but YMMV depending on other factors. > > Also there is lot of confusion in the terminology. The default routing is > compositeID routing. The implicit routing which Eric mentioned is the > manual routing. https://issues.apache.org/jira/browse/SOLR-6630 > > Which routing you are suggesting to use? Can you clarify again. Also > what's your exact use case. Do you query old aged documents or you don't > need to and most or all of your queries are supposed to go to shard with > newer documents. > > Thanks, > Susheel > > On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson > wrote: > >> 1M docs/month shouldn't make Solr break a sweat. If it really worries >> you and you're indexing in a big batch, index during off hours. At >> very worst, if you're ingesting them all at once you might have to >> throttle the indexing a bit. >> >> Frankly, most of the time acquiring the documents from the system of >> record is where the bottleneck is and Solr easily handles the indexing >> load. >> >> The other advantage is that if you use implicit routing rather than a >> composite ID, you can add shards to your collection one at a time as >> required, for time-series data that's an elegant way to "age out" old >> documents. >> >> Best, >> Erick >> >> On Sat, Jul 1, 2017 at 8:57 AM, mganeshs wrote: >> > Hi Susheel, >> > >> > Currently we have around 20M documents already and we are expecting now >> on >> > that every month 1M of documents. >> > The reason why don't want to for time based implicit routing is that, >> all >> > documents will end up with recent shard and so indexing will be heavy >> for >> > the new shard, where as older shards will be used just for query >> purpose. >> > If we have default sharding, then load for indexing is distributed >> across >> > all the shards. That's the reason we would like to stick to default >> > sharding. But Join is the issue over here when default sharding is used >> :-( >> > >> > >> > >> > -- >> > View this message in context: http://lucene.472066.n3.nabble >> .com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html >> > Sent from the Solr - User mailing list archive at Nabble.com. >> > >
Re: Allow Join over two sharded collection
Unsubscribe Sent from my iPhone > On Jul 1, 2017, at 8:02 PM, Susheel Kumar wrote: > > Depending on your use case people also use collection aliasing for time > series data. See below > > https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/ > >> On Sat, Jul 1, 2017 at 7:13 PM, Susheel Kumar wrote: >> >> As Eric said 1docs/month isn't a big deal. I have 45+ million docs in one >> shard but YMMV depending on other factors. >> >> Also there is lot of confusion in the terminology. The default routing is >> compositeID routing. The implicit routing which Eric mentioned is the >> manual routing. https://issues.apache.org/jira/browse/SOLR-6630 >> >> Which routing you are suggesting to use? Can you clarify again. Also >> what's your exact use case. Do you query old aged documents or you don't >> need to and most or all of your queries are supposed to go to shard with >> newer documents. >> >> Thanks, >> Susheel >> >> On Sat, Jul 1, 2017 at 12:14 PM, Erick Erickson >> wrote: >> >>> 1M docs/month shouldn't make Solr break a sweat. If it really worries >>> you and you're indexing in a big batch, index during off hours. At >>> very worst, if you're ingesting them all at once you might have to >>> throttle the indexing a bit. >>> >>> Frankly, most of the time acquiring the documents from the system of >>> record is where the bottleneck is and Solr easily handles the indexing >>> load. >>> >>> The other advantage is that if you use implicit routing rather than a >>> composite ID, you can add shards to your collection one at a time as >>> required, for time-series data that's an elegant way to "age out" old >>> documents. >>> >>> Best, >>> Erick >>> On Sat, Jul 1, 2017 at 8:57 AM, mganeshs wrote: Hi Susheel, Currently we have around 20M documents already and we are expecting now >>> on that every month 1M of documents. The reason why don't want to for time based implicit routing is that, >>> all documents will end up with recent shard and so indexing will be heavy >>> for the new shard, where as older shards will be used just for query >>> purpose. If we have default sharding, then load for indexing is distributed >>> across all the shards. That's the reason we would like to stick to default sharding. But Join is the issue over here when default sharding is used >>> :-( -- View this message in context: http://lucene.472066.n3.nabble >>> .com/Allow-Join-over-two-sharded-collection-tp4343443p4343803.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >>
Re: Include JSON facet inside Solr Streaming
Yes. In general, any expression can be nested inside other expressions or stream sources. On Sat, Jul 1, 2017 at 1:43 AM, Zheng Lin Edwin Yeo wrote: > Is it possible to do a Join (Eg: hashJoin, innerJoin) on the facet stream > expression? > > Regards, > Edwin > > On 1 July 2017 at 03:30, Susheel Kumar wrote: > > > I doubt it can work. Why not utilise facet stream expression which use > > JSON facet under the cover. > > > > On Thu, Jun 29, 2017 at 9:44 PM, Zheng Lin Edwin Yeo < > edwinye...@gmail.com > > > > > wrote: > > > > > Hi, > > > > > > Is it currently possible to include JSON facet inside Solr Streaming? > > > > > > I am trying out with the following query, which combines JSON facet > > > together with the hashJoin from Streaming, but we get the error saying > > > that is > > > not a proper expression clause. > > > > > > If it is possible, what should be the correct way to include it? > > > > > > I'm using Solr 6.5.1. > > > > > > http://localhost:8983/edm/collection1/stream?expr=hashJoin( > > > search(collection1, > > > q="id:", > > > fq="{!child of="contentType_s:collection1Header"}field1a:*& > json.facet={ > > > TotalAmount1:"sum(totalAmount1)"}", > > > fl="field1a,field1b,field1c,field1d", > > > sort="field1a asc", > > > qt="/export"), > > > hashed=search(collection2, > > > q"=id:", > > > fq="json.facet={ > > > TotalAmount2:"sum(totalAmount2)"}", > > > fl="field2a,field2b,field2c,field2d", > > > sort="field2a asc", > > > qt="/export"), > > > on="field1a=field1b" > > > )&indent=true > > > > > > > > > Regards, > > > Edwin > > > > > >
Re: Include JSON facet inside Solr Streaming
Ok,thank you. Regards, Edwin On 2 July 2017 at 08:41, Susheel Kumar wrote: > Yes. In general, any expression can be nested inside other expressions or > stream sources. > > On Sat, Jul 1, 2017 at 1:43 AM, Zheng Lin Edwin Yeo > wrote: > > > Is it possible to do a Join (Eg: hashJoin, innerJoin) on the facet stream > > expression? > > > > Regards, > > Edwin > > > > On 1 July 2017 at 03:30, Susheel Kumar wrote: > > > > > I doubt it can work. Why not utilise facet stream expression which use > > > JSON facet under the cover. > > > > > > On Thu, Jun 29, 2017 at 9:44 PM, Zheng Lin Edwin Yeo < > > edwinye...@gmail.com > > > > > > > wrote: > > > > > > > Hi, > > > > > > > > Is it currently possible to include JSON facet inside Solr Streaming? > > > > > > > > I am trying out with the following query, which combines JSON facet > > > > together with the hashJoin from Streaming, but we get the error > saying > > > > that is > > > > not a proper expression clause. > > > > > > > > If it is possible, what should be the correct way to include it? > > > > > > > > I'm using Solr 6.5.1. > > > > > > > > http://localhost:8983/edm/collection1/stream?expr=hashJoin( > > > > search(collection1, > > > > q="id:", > > > > fq="{!child of="contentType_s:collection1Header"}field1a:*& > > json.facet={ > > > > TotalAmount1:"sum(totalAmount1)"}", > > > > fl="field1a,field1b,field1c,field1d", > > > > sort="field1a asc", > > > > qt="/export"), > > > > hashed=search(collection2, > > > > q"=id:", > > > > fq="json.facet={ > > > > TotalAmount2:"sum(totalAmount2)"}", > > > > fl="field2a,field2b,field2c,field2d", > > > > sort="field2a asc", > > > > qt="/export"), > > > > on="field1a=field1b" > > > > )&indent=true > > > > > > > > > > > > Regards, > > > > Edwin > > > > > > > > > >
Re: Unique() metrics not supported in Solr Streaming facet stream source
Will try to do it if I have the time. Regards, Edwin On 30 June 2017 at 01:23, Erick Erickson wrote: > Can you work up a patch if it's a priority for you? > > Best, > Erick > > On Thu, Jun 29, 2017 at 8:51 AM, Zheng Lin Edwin Yeo > wrote: > > Hi Joel, > > > > Thanks for your reply. > > > > Hopefully we can see it in the new version soon, as it will be helpful > for > > the project which we are working on. > > > > Regards, > > Edwin > > > > > > On 29 June 2017 at 21:32, Joel Bernstein wrote: > > > >> This is mainly due to focus on other things. It would great to support > all > >> the aggregate functions in facet, rollup and timeseries expressions. > >> > >> Joel Bernstein > >> http://joelsolr.blogspot.com/ > >> > >> On Thu, Jun 29, 2017 at 8:23 AM, Zheng Lin Edwin Yeo < > edwinye...@gmail.com > >> > > >> wrote: > >> > >> > Hi, > >> > > >> > We are working on the Solr Streaming expression, using the facet > stream > >> > source. > >> > > >> > As the underlying structure is using JSON Facet, would like to find > out > >> why > >> > the unique() metrics is not supported? Currently, it only supports > >> sum(col) > >> > , avg(col), min(col), max(col), count(*) > >> > > >> > I'm using Solr 6.5.1 > >> > > >> > Regards, > >> > Edwin > >> > > >> >