fetch streaming expression multiple collections problem
Hello all, When I try to use the "select" streaming expression with multiple collections it works without any problems, like: search( "collection1,collection2", q="*:*", fl="field1,field2", qt="/export", sort="field1 desc" ) but when I try to use the "fetch" expression similarly: fetch( "collection1,collection2" It gives me an error saying: "EXCEPTION": "java.io.IOException: Slices not found for \"collection1,collection2\"" when I use it without quotes problem is resolved but another problem arises: fetch( collection1,collection2 which fetches fields only from collection1.. and returns empty for documents residing in collection2. I took a look at the source code of fetch and select expressions, they both get collection parameter exactly the same way, using: String collectionName = factory.getValueOperand(expression, 0) I'm lost. When I use an alias in place of multiple collections it works as desired, but we have many collections and queries are generated dynamically so we would need many combination of aliases. Need help. Regards -- uyilmaz
Worker node / collection creation, parallelized streams
Hi all, Today I was fiddling with a streaming expression that takes too long to finish and times out. First of all, is it normal for it to time out, rather than just taking too long? Then I read about the parallelized streaming expressions, which takes a worker number as parameter. We have 10 nodes in our cluster. First question is, if I want to run it in 10 worker nodes, should I provide a partition key that takes exactly 10 different values, or Solr itself figures 10 different values from it? "mod" function query with modulus 10 came into my mind, but I got various errors when using it as a partition key. Second question is, how do I correctly create a worker collection? Should it be an empty collection with 10 shards with 1 replica each, or 1 shard with 10 replicas? When I used the latter, I got array IndexOutOfBounds errors with workers parameter set to greater than 1. ~Regards -- uyilmaz
Solr Web UI
Hello all, Our Solr web ui (/solr/#/) doesn't show query results if it takes longer than, say 3-4 seconds. When I look at the browser console, I see the request is getting cancelled. I went through the javascript code but didn't see a part that cancels the request after a couple of seconds. Do you see this behavior too? Is it intentional? I usually use Postman for querying so this is not a problem most of the time, but I just wanted to see streaming expression explanation diagrams. Have a nice day~~ -- uyilmaz
Strange fetch streaming expression doesn't fetch fields sometimes?
Hi all, I have a streaming expression looking like: fetch( myAlias, top( n=3, various expressions here sort="count(*) desc" ), fl="username", on="userid=userid", batchSize=3 ) which fails to fetch username field for the 1st result: { "result-set":{ "docs":[{ "userid":"123123", "count(*)":58} ,{ "userid":"123123123", "count(*)":32, "username":"Ayha"} ,{ "userid":"12432423321323", "count(*)":30, "username":"MEHM"} ,{ "EOF":true, "RESPONSE_TIME":34889}]}} But strangely, when I change n and batchSize both to 2 and touch nothing else, fetch fetches the first username correctly: fetch( myAlias, top( n=2, various expressions here sort="count(*) desc" ), fl="username", on="userid=userid", batchSize=2 ) Result is: { "result-set":{ "docs":[{ "userid":"123123", "count(*)":58, "username":"mura"} ,{ "userid":"123123123", "count(*)":32, "username":"Ayha"} ,{ "EOF":true, "RESPONSE_TIME":34889}]}} What can be the problem? Regards ~~ufuk -- uyilmaz
Re: Strange fetch streaming expression doesn't fetch fields sometimes?
I think I found the reason right after asking (facepalm), but it took me days to realize this. I think fetch performs a naive "in" query, something like: q="userid:(123123 123123123 12432423321323)&rows={batchSize}" When userid to document relation is one-to-many, it is possible that above query will result in documents consisting entirely of last two userid's documents, so the first one is left out, resulting in empty username. Docs state that one to many is not supported with fetch, but I didn't stumble onto this issue until recently so I just assumed it would work. Sorry to take your time, I hope this helps somebody later. Have a nice day. On Wed, 14 Oct 2020 00:38:05 +0300 uyilmaz wrote: > > Hi all, > > I have a streaming expression looking like: > > fetch( > myAlias, > top( > n=3, > various expressions here > sort="count(*) desc" > ), > fl="username", on="userid=userid", batchSize=3 > ) > > which fails to fetch username field for the 1st result: > > { > "result-set":{ > "docs":[{ > "userid":"123123", > "count(*)":58} >,{ > "userid":"123123123", > "count(*)":32, > "username":"Ayha"} >,{ > "userid":"12432423321323", > "count(*)":30, > "username":"MEHM"} >,{ > "EOF":true, > "RESPONSE_TIME":34889}]}} > > But strangely, when I change n and batchSize both to 2 and touch nothing > else, fetch fetches the first username correctly: > > fetch( > myAlias, > top( > n=2, > various expressions here > sort="count(*) desc" > ), > fl="username", on="userid=userid", batchSize=2 > ) > > Result is: > > { > "result-set":{ > "docs":[{ > "userid":"123123", > "count(*)":58, > "username":"mura"} >,{ > "userid":"123123123", > "count(*)":32, > "username":"Ayha"} >,{ > "EOF":true, > "RESPONSE_TIME":34889}]}} > > What can be the problem? > > Regards > > ~~ufuk > > -- > uyilmaz -- uyilmaz
just testing if my emails are reaching the mailing list
Hello all, I have never got an answer to my questions in this mailing list yet, and my mail client shows INVALID next to my mail address, so I thought I should check if my emails are reaching to you. Can anyone reply? Regards -- uyilmaz
Re: just testing if my emails are reaching the mailing list
Thank you! On Wed, 14 Oct 2020 09:41:16 +0200 Szűcs Roland wrote: > Hi, > I got it from the solr user list. > > > Roland > > uyilmaz ezt írta (időpont: 2020. okt. 14., > Sze, 9:39): > > > Hello all, > > > > I have never got an answer to my questions in this mailing list yet, and > > my mail client shows INVALID next to my mail address, so I thought I should > > check if my emails are reaching to you. > > > > Can anyone reply? > > > > Regards > > > > -- > > uyilmaz > > -- uyilmaz
Re: Strange fetch streaming expression doesn't fetch fields sometimes?
Is it possible to duplicate its functionality using existing expressions? In SQL, while grouping you can just say first(column) to get some one-to-many value if you don't care which one you get. Solr usually only has min,max,avg.. aggregation functions. If it had a "first" function I could just get userid and first(username) in an expression, I sometimes use min(username) as a trick while faceting to get extra fields alongside faceted results, but max,min only accepts numbers in streaming expressions. On Wed, 14 Oct 2020 20:47:28 -0400 Joel Bernstein wrote: > Yes, the docs mention one-to-one and many-to-one fetches, but one-to-many > is not supported currently. I've never really been happy with fetch. It > really needs to be replaced with a standard nested loop join that handles > all scenarios. > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Tue, Oct 13, 2020 at 6:30 PM uyilmaz wrote: > > > I think I found the reason right after asking (facepalm), but it took me > > days to realize this. > > > > I think fetch performs a naive "in" query, something like: > > > > q="userid:(123123 123123123 12432423321323)&rows={batchSize}" > > > > When userid to document relation is one-to-many, it is possible that above > > query will result in documents consisting entirely of last two userid's > > documents, so the first one is left out, resulting in empty username. Docs > > state that one to many is not supported with fetch, but I didn't stumble > > onto this issue until recently so I just assumed it would work. > > > > Sorry to take your time, I hope this helps somebody later. > > > > Have a nice day. > > > > On Wed, 14 Oct 2020 00:38:05 +0300 > > uyilmaz wrote: > > > > > > > > Hi all, > > > > > > I have a streaming expression looking like: > > > > > > fetch( > > > myAlias, > > > top( > > > n=3, > > > various expressions here > > > sort="count(*) desc" > > > ), > > > fl="username", on="userid=userid", batchSize=3 > > > ) > > > > > > which fails to fetch username field for the 1st result: > > > > > > { > > > "result-set":{ > > > "docs":[{ > > > "userid":"123123", > > > "count(*)":58} > > >,{ > > > "userid":"123123123", > > > "count(*)":32, > > > "username":"Ayha"} > > >,{ > > > "userid":"12432423321323", > > > "count(*)":30, > > > "username":"MEHM"} > > >,{ > > > "EOF":true, > > > "RESPONSE_TIME":34889}]}} > > > > > > But strangely, when I change n and batchSize both to 2 and touch nothing > > else, fetch fetches the first username correctly: > > > > > > fetch( > > > myAlias, > > > top( > > > n=2, > > > various expressions here > > > sort="count(*) desc" > > > ), > > > fl="username", on="userid=userid", batchSize=2 > > > ) > > > > > > Result is: > > > > > > { > > > "result-set":{ > > > "docs":[{ > > > "userid":"123123", > > > "count(*)":58, > > > "username":"mura"} > > >,{ > > > "userid":"123123123", > > > "count(*)":32, > > > "username":"Ayha"} > > >,{ > > > "EOF":true, > > > "RESPONSE_TIME":34889}]}} > > > > > > What can be the problem? > > > > > > Regards > > > > > > ~~ufuk > > > > > > -- > > > uyilmaz > > > > > > -- > > uyilmaz > > -- uyilmaz
Very high disk read rate with an idle solr
What can cause a very high (1G/s, which is the max our disks can provide) disk read rate that goes on for hours, with a Solr instance not being indexed or queried? Last days our SolrCloud cluster stops responding to queries, today we tried stopping indexing and querying it, to find out what is going on. 2 collections seem to be in recovery, can recovery cause this behavior? Regards and have a nice day -- uyilmaz
Faceting on indexed=false stored=false docValues=true fields
Hey all, >From my little experiments, I see that (if I didn't make a stupid mistake) we >can facet on fields marked as both indexed and stored being false: I'm suprised by this, I thought I would need to index it. Can you confirm this? Regards -- uyilmaz
Re: Faceting on indexed=false stored=false docValues=true fields
Thanks! This also contributed to my confusion: https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters "If you want Solr to perform both analysis (for searching) and faceting on the full literal strings, use the copyField directive in your Schema to create two versions of the field: one Text and one String. Make sure both are indexed="true"." On Mon, 19 Oct 2020 13:08:00 -0400 Alexandre Rafalovitch wrote: > I think this is all explained quite well in the Ref Guide: > https://lucene.apache.org/solr/guide/8_6/docvalues.html > > DocValues is a different way to index/store values. Faceting is a > primary use case where docValues are better than what 'indexed=true' > gives you. > > Regards, >Alex. > > On Mon, 19 Oct 2020 at 12:51, uyilmaz wrote: > > > > > > Hey all, > > > > From my little experiments, I see that (if I didn't make a stupid mistake) > > we can facet on fields marked as both indexed and stored being false: > > > > > stored="false" docValues="true"/> > > > > I'm suprised by this, I thought I would need to index it. Can you confirm > > this? > > > > Regards > > > > -- > > uyilmaz -- uyilmaz
Re: Faceting on indexed=false stored=false docValues=true fields
Thanks for taking time to write a detailed answer. We use Solr to both store our data and to perform aggregations, using faceting or streaming expressions. When required analysis is too complex to do in Solr, we export large query results from Solr to a more capable analysis tool. So I guess all fields need to be docValues="true", because export handler and streaming both require fields to have docValues, and even if I won't use a field in queries or facets, it should be in available to read in result set. Fields that won't be searched or faceted can be (indexed=false stored=false docValues=true) right? --uyilmaz On Mon, 19 Oct 2020 14:14:27 -0400 Michael Gibney wrote: > As you've observed, it is indeed possible to facet on fields with > docValues=true, indexed=false; but in almost all cases you should > probably set indexed=true. 1. for distributed facet count refinement, > the "indexed" approach is used to look up counts by value; 2. assuming > you're wanting to do something usual, e.g. allow users to apply > filters based on facet counts, the filter application would use the > "indexed" approach as well. Where indexed=false, if either filtering > or distributed refinement is attempted, I'm not 100% sure what > happens. It might fail, or lead to inconsistent results, or attempt to > look up results via the equivalent of a "table scan" over docValues (I > think the last of these is what actually happens, fwiw) ... but none > of these options is likely desirable. > > Michael > > On Mon, Oct 19, 2020 at 1:42 PM uyilmaz wrote: > > > > Thanks! This also contributed to my confusion: > > > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters > > > > "If you want Solr to perform both analysis (for searching) and faceting on > > the full literal strings, use the copyField directive in your Schema to > > create two versions of the field: one Text and one String. Make sure both > > are indexed="true"." > > > > On Mon, 19 Oct 2020 13:08:00 -0400 > > Alexandre Rafalovitch wrote: > > > > > I think this is all explained quite well in the Ref Guide: > > > https://lucene.apache.org/solr/guide/8_6/docvalues.html > > > > > > DocValues is a different way to index/store values. Faceting is a > > > primary use case where docValues are better than what 'indexed=true' > > > gives you. > > > > > > Regards, > > >Alex. > > > > > > On Mon, 19 Oct 2020 at 12:51, uyilmaz wrote: > > > > > > > > > > > > Hey all, > > > > > > > > From my little experiments, I see that (if I didn't make a stupid > > > > mistake) we can facet on fields marked as both indexed and stored being > > > > false: > > > > > > > > > > > indexed="false" stored="false" docValues="true"/> > > > > > > > > I'm suprised by this, I thought I would need to index it. Can you > > > > confirm this? > > > > > > > > Regards > > > > > > > > -- > > > > uyilmaz > > > > > > -- > > uyilmaz -- uyilmaz
Re: Faceting on indexed=false stored=false docValues=true fields
Sorry, correction, taking "the" time On Mon, 19 Oct 2020 22:18:30 +0300 uyilmaz wrote: > Thanks for taking time to write a detailed answer. > > We use Solr to both store our data and to perform aggregations, using > faceting or streaming expressions. When required analysis is too complex to > do in Solr, we export large query results from Solr to a more capable > analysis tool. > > So I guess all fields need to be docValues="true", because export handler and > streaming both require fields to have docValues, and even if I won't use a > field in queries or facets, it should be in available to read in result set. > Fields that won't be searched or faceted can be (indexed=false stored=false > docValues=true) right? > > --uyilmaz > > > On Mon, 19 Oct 2020 14:14:27 -0400 > Michael Gibney wrote: > > > As you've observed, it is indeed possible to facet on fields with > > docValues=true, indexed=false; but in almost all cases you should > > probably set indexed=true. 1. for distributed facet count refinement, > > the "indexed" approach is used to look up counts by value; 2. assuming > > you're wanting to do something usual, e.g. allow users to apply > > filters based on facet counts, the filter application would use the > > "indexed" approach as well. Where indexed=false, if either filtering > > or distributed refinement is attempted, I'm not 100% sure what > > happens. It might fail, or lead to inconsistent results, or attempt to > > look up results via the equivalent of a "table scan" over docValues (I > > think the last of these is what actually happens, fwiw) ... but none > > of these options is likely desirable. > > > > Michael > > > > On Mon, Oct 19, 2020 at 1:42 PM uyilmaz wrote: > > > > > > Thanks! This also contributed to my confusion: > > > > > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters > > > > > > "If you want Solr to perform both analysis (for searching) and faceting > > > on the full literal strings, use the copyField directive in your Schema > > > to create two versions of the field: one Text and one String. Make sure > > > both are indexed="true"." > > > > > > On Mon, 19 Oct 2020 13:08:00 -0400 > > > Alexandre Rafalovitch wrote: > > > > > > > I think this is all explained quite well in the Ref Guide: > > > > https://lucene.apache.org/solr/guide/8_6/docvalues.html > > > > > > > > DocValues is a different way to index/store values. Faceting is a > > > > primary use case where docValues are better than what 'indexed=true' > > > > gives you. > > > > > > > > Regards, > > > >Alex. > > > > > > > > On Mon, 19 Oct 2020 at 12:51, uyilmaz > > > > wrote: > > > > > > > > > > > > > > > Hey all, > > > > > > > > > > From my little experiments, I see that (if I didn't make a stupid > > > > > mistake) we can facet on fields marked as both indexed and stored > > > > > being false: > > > > > > > > > > > > > > indexed="false" stored="false" docValues="true"/> > > > > > > > > > > I'm suprised by this, I thought I would need to index it. Can you > > > > > confirm this? > > > > > > > > > > Regards > > > > > > > > > > -- > > > > > uyilmaz > > > > > > > > > -- > > > uyilmaz > > > -- > uyilmaz -- uyilmaz
Solr tag cloud - words and counts
I have been trying to find a way to do this in Solr for a while. Perform a query, and for a text_general field in the result set, find each term's # of occurences. - I tried the Terms Component, it doesn't have the ability to restrict the result set with a query. - Tried faceting on the field, since it's a text_general field it doesn't have docValues, plus cardinality is very high (millions of documents * tens of words in each field), so it works but it's very slow and sometimes times out. - Tried significantTerms streaming expression, but it's logically not the same with what I'm looking for. It gives the words occuring frequently in the result set, but not occuring as frequently outside it. So it's better to find out frequency anomalies rather than simply the counts. Do you have any suggestions? Regards -- uyilmaz
Re: docValues usage
Hi, I'm by no means expert on this so if anyone sees a mistake please correct me. I think you need to index this field, since boost functions are added to the query as optional clauses (https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter). It's like boosting a regular field by putting ^2 next to it in a query. Storing or enabling docValues will unnecesarily consume space/memory. On Tue, 3 Nov 2020 16:10:50 -0800 Wei wrote: > Hi, > > I have a couple of primitive single value numeric type fields, their > values are used in boosting functions, but not used in sort/facet. or in > returned response. Should I use docValues for them in the schema? I can > think of the following options: > > 1) indexed=true, stored=true, docValues=false > 2) indexed=true, stored=false, docValues=true > 3) indexed=false, stored=false, docValues=true > > What would be the performance implications for these options? > > Best, > Wei -- uyilmaz
when to use stored over docValues and useDocValuesAsStored
Hi, I heavily use streaming expressions and facets, or export large amounts of data from Solr to Spark to make analyses. Please correct me if I know wrong: + requesting a non-docValues field in a response causes whole document to be decompressed and read from disk + streaming expressions and export handler requires every field read to have docValues - docValues increases index size, therefore memory requirement, stored only uses disk space - stored preserves order of multivalued fields It seems stored is only useful when I have a multivalued field that I care about the index-time order of things, and since I will be using the export handler, it will use docValues anyways and lose the order. So is there any case that I need stored=true? Best, ufuk -- uyilmaz