Re: Tokenization and SAI query syntax

2023-08-14 Thread Jon Haddad
biguous. > >>> > >>> If we object to allowing expr embedding of a subset of the Lucene syntax, > >>> I can't imagine we're okay w/ then jamming a subset of that syntax into > >>> the main CQL grammar. > >>> > >>>

Re: Tokenization and SAI query syntax

2023-08-13 Thread Caleb Rackliffe
ubset of that syntax into the >>> main CQL grammar. >>> >>> If we want to do this in non-expr CQL space, I think using functions >>> (ignoring the implementation complexity) at least removes ambiguity. >>> "token_match", "phrase_

Re: Tokenization and SAI query syntax

2023-08-13 Thread Jon Haddad
o this in non-expr CQL space, I think using functions > > (ignoring the implementation complexity) at least removes ambiguity. > > "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be > > pretty clear, althou

Re: Tokenization and SAI query syntax

2023-08-07 Thread Benedict
Yep, this sounds like the potentially least bad approach for now. Sorry Caleb, I jumped in without properly reading the thread and assumed we were proposing changes to CQL.If it’s clear we’re dropping into a sub-language and providing a sub-query to it that’s SAI-specific, that gives us pretty

Re: Tokenization and SAI query syntax

2023-08-07 Thread Josh McKenzie
want to do this in non-expr CQL space, I think using functions > (ignoring the implementation complexity) at least removes ambiguity. > "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be > pretty clear, although the

Re: Tokenization and SAI query syntax

2023-08-07 Thread Caleb Rackliffe
mentation complexity) at least removes ambiguity. "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be pretty clear, although there may be other problems. For instance, what happens when I try to use "token_match" on an inde

Re: Tokenization and SAI query syntax

2023-08-07 Thread Atri Sharma
ugh it seems to have >> some >> > > differences from the feature set in SAI. >> > > >> >> > > >> That said, there seems to be enough of an overlap that it would >> make >> > > sense to consider using LIKE in

Re: Tokenization and SAI query syntax

2023-08-07 Thread J. D. Jordan
t; > would be a little odd if we have different syntax for different indexes. > > >> > > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md > > >> > > >> I think one complication here is that there seems to be a desire, that > &g

Re: Tokenization and SAI query syntax

2023-08-07 Thread Caleb Rackliffe
I. >> > > >> >> > > >> That said, there seems to be enough of an overlap that it would >> make >> > > sense to consider using LIKE in the same manner, doesn't it? I think >> it >> > > would be a little odd if we have d

Re: Tokenization and SAI query syntax

2023-08-07 Thread Benedict
doesn't it?  I think it > > would be a little odd if we have different syntax for different indexes. > > >> > > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md > > >> > > >> I think one complication here is that there seems t

Re: Tokenization and SAI query syntax

2023-08-07 Thread Mike Adamson
syntax for different > indexes. > > > >> > > > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md > > > >> > > > >> I think one complication here is that there seems to be a desire, > that > > > I very mu

Re: Tokenization and SAI query syntax

2023-08-03 Thread Jon Haddad
ication here is that there seems to be a desire, that > > I very much agree with, to expose as much of the underlying flexibility of > > Lucene as much as possible. If it means we use Caleb's suggestion, I'd ask > > that the queries that SASI and SAI both support us

Re: Tokenization and SAI query syntax

2023-08-03 Thread Jon Haddad
ferent indexes. > >> > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md > >> > >> I think one complication here is that there seems to be a desire, that I > >> very much agree with, to expose as much of the underlying flexibility of >

Re: Tokenization and SAI query syntax

2023-08-02 Thread Caleb Rackliffe
> >> > >> I think one complication here is that there seems to be a desire, that > I very much agree with, to expose as much of the underlying flexibility of > Lucene as much as possible. If it means we use Caleb's suggestion, I'd ask > that the queries that S

Re: Tokenization and SAI query syntax

2023-08-02 Thread Jeremiah Jordan
at I >> very much agree with, to expose as much of the underlying flexibility of >> Lucene as much as possible. If it means we use Caleb's suggestion, I'd ask >> that the queries that SASI and SAI both support use the same syntax, even if >> it means there's t

Re: Tokenization and SAI query syntax

2023-08-02 Thread J. D. Jordan
seems to be a desire, that I very > much agree with, to expose as much of the underlying flexibility of Lucene as > much as possible. If it means we use Caleb's suggestion, I'd ask that the > queries that SASI and SAI both support use the same syntax, even if it means

Re: Tokenization and SAI query syntax

2023-08-02 Thread Jon Haddad
ere seems to be a desire, that I very much agree with, to expose as much of the underlying flexibility of Lucene as much as possible. If it means we use Caleb's suggestion, I'd ask that the queries that SASI and SAI both support use the same syntax, even if it means there's two

Re: Tokenization and SAI query syntax

2023-08-01 Thread Caleb Rackliffe
y isn't my favorite choice, but it's there. The ElasticSearch match query syntax - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html Again, not my favorite. It's verbose, and probably too powerful for us. ElasticSearch's documentat

Re: Tokenization and SAI query syntax

2023-07-24 Thread Josh McKenzie
general structure of the MATCH operator. That said, I also think CONTAINS loses something important that you allude to here Jonathan: > with corresponding query-time tokenization and analysis. This means that the > query term is not always a substring of the original string! Besides obvious

Re: Tokenization and SAI query syntax

2023-07-24 Thread Benedict
it creates an ambiguity when the queried column belongs to the primary key. For some queries we wouldn't know whether the user wants a primary key query using regular equality or an index query using the analyzer.`term_matches(column, term)` seems quite clear and hard to misinterpret, but it&#x

Re: Tokenization and SAI query syntax

2023-07-24 Thread Andrés de la Peña
`column = term` is definitively problematic because it creates an ambiguity when the queried column belongs to the primary key. For some queries we wouldn't know whether the user wants a primary key query using regular equality or an index query using the analyzer. `term_matches(column,

Tokenization and SAI query syntax

2023-07-24 Thread Jonathan Ellis
corresponding query-time tokenization and analysis. This means that the query term is not always a substring of the original string! Besides obvious transformations like lowercasing, you have things like PhoneticFilter available as well. Here are my thoughts on some of the options: `column = term`. This

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-07-10 Thread Jacek Lewandowski
Given what was said, I propose rephrasing this functionality to limit the memory used to execute a query. We will not expose the page size measured in bytes to the client. Instead, an upper limit will be a guardrail so that we won't fetch more data. Aggregation query with grouping is a sp

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-13 Thread Benjamin Lerer
/native_protocol_v5.spec#L1247-L1253 >> >> - Clients should not rely on the actual size of the result set returned >> to >> decide if there are more results to fetch or not. Instead, they >> should always >> check the Has_more_pages flag (unless they did

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jacek Lewandowski
ad, they should > always > check the Has_more_pages flag (unless they did not enable paging for > the query > obviously). Clients should also not assert that no result will have > more than > results. While the current implementation always > respects > the exact val

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie
hey did not enable paging for the > query > obviously). Clients should also not assert that no result will have more > than > results. While the current implementation always > respects > the exact value of , we reserve the right to return > slightly sm

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jeremiah Jordan
As long as it is valid in the paging protocol to return a short page, but still say “there are more pages”, I think that is fine to do that. For an actual LIMIT that is part of the user query, I think the server must always have returned all data that fits into the LIMIT when all pages have been

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie
Yeah, my bad. I have paging on the brain. Seriously. I can't think of a use-case in which a LIMIT based on # bytes makes sense from a user perspective. On Mon, Jun 12, 2023, at 1:35 PM, Jeff Jirsa wrote: > > > On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer wrote: >>> If you have rows that var

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jeff Jirsa
On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer wrote: > If you have rows that vary significantly in their size, your latencies >> could end up being pretty unpredictable using a LIMIT BY . Being >> able to specify a limit by bytes at the driver / API level would allow app >> devs to get more dete

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Benjamin Lerer
he user in most drivers. It is simply a way to optimize your memory > usage from end to end. > > I do not like the approach of using both of them simultaneously because if > you request a page with a certain amount of rows and do not get it then is > is really confusing and can be a pro

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Josh McKenzie
taneously because if >>> you request a page with a certain amount of rows and do not get it then is >>> is really confusing and can be a problem for some usecases. We have users >>> keeping their session open and the page information to display page of data. >>

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jacek Lewandowski
then is > is really confusing and can be a problem for some usecases. We have users > keeping their session open and the page information to display page of data. > > Le lun. 12 juin 2023 à 09:08, Jacek Lewandowski < > lewandowski.ja...@gmail.com> a écrit : > >> Hi, &g

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jacek Lewandowski
Limiting the amount of returned data in bytes in addition to the row limit could be helpful when applied transparently by the server as a kind of guardrail. The server could fail the query if it exceeds some administratively imposed limit on the configuration level, WDYT? pon., 12 cze 2023 o 11

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Benedict
,I was working on limiting query results by their size expressed in bytes, and some questions arose that I'd like to bring to the mailing list.The semantics of queries (without aggregation) - data limits are applied on the raw data returned from replicas - while it works fine for the row numbe

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Benjamin Lerer
amount of rows and do not get it then is is really confusing and can be a problem for some usecases. We have users keeping their session open and the page information to display page of data. Le lun. 12 juin 2023 à 09:08, Jacek Lewandowski a écrit : > Hi, > > I was working on limit

[DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jacek Lewandowski
Hi, I was working on limiting query results by their size expressed in bytes, and some questions arose that I'd like to bring to the mailing list. The semantics of queries (without aggregation) - data limits are applied on the raw data returned from replicas - while it works fine for th

Re: Vector search demo, and query syntax

2023-05-24 Thread DuyHai Doan
not supported. > > I also put together a demo that uses this branch to provide context to > OpenAI’s GPT, available here: *https://github.com/jbellis/cassgpt* > <https://github.com/jbellis/cassgpt>. > > Here is the query that gets executed: > > SELECT id, start,

Re: Vector search demo, and query syntax

2023-05-24 Thread Josh McKenzie
handles distributed scatter/gather. Updates and deletes to vector >> values are still not supported. >> >> I also put together a demo that uses this branch to provide context to >> OpenAI’s GPT, available here: _https://github.com/jbellis/cassgpt_. >>

Re: Vector search demo, and query syntax

2023-05-23 Thread Jeremiah D Jordan
scatter/gather. Updates and deletes to vector > values are still not supported. > > I also put together a demo that uses this branch to provide context to > OpenAI’s GPT, available here: https://github.com/jbellis/cassgpt. > > Here is the query that gets executed: >

Re: Vector search demo, and query syntax

2023-05-23 Thread Patrick McFadin
t;> >>> *I propose that we adopt `ORDER BY` syntax, supporting it for vector >>> indexes first and eventually for all SAI indexes. So this query would >>> becomeSELECT id, start, end, text FROM >>> {self.keyspace}.{self.table} ORDER BY embe

Re: Vector search demo, and query syntax

2023-05-23 Thread Jonathan Ellis
? > > Not really a common syntax, but could be useful down the line > > On May 23, 2023, at 12:37 AM, Mick Semb Wever wrote: > > >> *I propose that we adopt `ORDER BY` syntax, supporting it for vector >> indexes first and eventually for all SAI indexes. So this que

Re: Vector search demo, and query syntax

2023-05-23 Thread David Capwell
ver wrote: > >> I propose that we adopt `ORDER BY` syntax, supporting it for vector indexes >> first and eventually for all SAI indexes. So this query would become >> >> SELECT id, start, end, text >> FROM {self.keyspace}.{self.table} >> ORD

Re: Vector search demo, and query syntax

2023-05-23 Thread Mick Semb Wever
> > > *I propose that we adopt `ORDER BY` syntax, supporting it for vector > indexes first and eventually for all SAI indexes. So this query would > becomeSELECT id, start, end, text FROM > {self.keyspace}.{self.table} ORDER BY embedding ANN OF %s LIMIT %s*

Vector search demo, and query syntax

2023-05-22 Thread Jonathan Ellis
ibuted scatter/gather. Updates and deletes to vector values are still not supported.I also put together a demo that uses this branch to provide context to OpenAI’s GPT, available here: https://github.com/jbellis/cassgpt <https://github.com/jbellis/cassgpt>. Here is the query that g

Re: Query

2021-05-12 Thread Joshua McKenzie
Boot camp from 2014 has a lot of fundamentals that are still quite valid: https://www.slideshare.net/joshmckenzie/ The "Contributing to Cassandra" section of the docs can also help you get situated: https://cassandra.apache.org/doc/latest/development/ On Wed, May 12, 2021 at 9:16 AM Manish G wr

Query

2021-05-12 Thread Manish G
Hi All, Is there any documentation someone new can look into in terms of understanding cassandra code base? Manish

Re: Is my range read query behaving strange ?

2019-06-11 Thread Laxmikant Upadhyay
Does range query ignore purgable tombstone (which crossed grace period) in some cases? On Tue, Jun 11, 2019, 2:56 PM Laxmikant Upadhyay wrote: > In a 3 node cassandra 2.1.16 cluster where, one node has old mutation and > two nodes have evict-able (crossed gc grace period) tombstone produ

答复: [营销类邮件] Re: Would cqlsh describe command send query to cassandra server ?

2018-08-07 Thread 陈仕(shichen)-技术产品中心
in.le...@datastax.com] 发送时间: 2018年8月7日 19:45 收件人: dev@cassandra.apache.org 主题: [营销类邮件] Re: Would cqlsh describe command send query to cassandra server ? Hi, DESCRIBE commands are handled at the driver level not at the server level. The drivers fetch metadata and keep them up to date. When a DESCRIBE comma

Re: Would cqlsh describe command send query to cassandra server ?

2018-08-07 Thread Benjamin Lerer
tput. > > So I wonder whether cqlsh fetch keyspace’s and table’s metadata just from > local and not send query to server node ? > > Thanks. >

Would cqlsh describe command send query to cassandra server ?

2018-08-07 Thread 陈仕(shichen)-技术产品中心
whether cqlsh fetch keyspace’s and table’s metadata just from local and not send query to server node ? Thanks.

Re: Real time bad query logging framework in C*

2018-06-20 Thread Jaydeep Chovatia
ybe we could >really use a framework for that, I don't know. I agree, Cassandra already has details coming out as part of metrics, logging (like tombstones), etc. Current log messages for (tombstone messages, large partition message, slow query messages, etc.) are very useful, but one im

Re: Real time bad query logging framework in C*

2018-06-20 Thread Stefan Podkowinski
Jaydeep, thanks for taking this discussion to the dev list. I think it's the best place to introduce new idea, discuss them in general and how they potentially fit in. As already mention in the ticket, I do share your assessment that we should try to improve making operational issue more visible to

Real time bad query logging framework in C*

2018-06-19 Thread Jaydeep Chovatia
Hi, We have worked on developing some common framework to detect/log anti-patterns/bad queries in Cassandra. Target for this effort would be to reduce burden on ops to handle Cassandra at large scale, as well as help beginners to quickly identify performance problems with the Cassandra. Initially

Re: Customize Cassandra to execute a query multiple times

2017-07-27 Thread benjamin roth
ld then connect to Cassandra and execute the stored > query(ies). > > Given that there is no client waiting for a response, then latency is not > even a (major) issue, so the extra network hop is probably of little > consequence. > > Why would you want this to be an integral pa

Re: Customize Cassandra to execute a query multiple times

2017-07-27 Thread Marco Massenzio
Why not simply have a microservice that does this for you? It may expose an API that allows to either store queries and/or conditions that trigger the queries (maybe time elapsed, an alert generated, whatever...) and it would then connect to Cassandra and execute the stored query(ies). Given

Re: Customize Cassandra to execute a query multiple times

2017-07-26 Thread Ke Wang
ite requests by the server itself. Can you describe a little bit more on how to implement serializing the mutation into the table? Best, Ke 2017-07-26 22:57 GMT-07:00 Jeff Jirsa : > > > On 2017-07-26 22:19 (-0700), Ke Wang wrote: > > Hello all, > > > > Is there a way

Re: Customize Cassandra to execute a query multiple times

2017-07-26 Thread Jeff Jirsa
On 2017-07-26 22:19 (-0700), Ke Wang wrote: > Hello all, > > Is there a way to customize Cassandra to execute a query multiple times? > There's always a way... > My use case is the following. When the Cassandra server receives queries > from remote clients, besides

Customize Cassandra to execute a query multiple times

2017-07-26 Thread Ke Wang
Hello all, Is there a way to customize Cassandra to execute a query multiple times? My use case is the following. When the Cassandra server receives queries from remote clients, besides executing those queries, the server also stores the queries. In the future, the server can re-execute stored

Re: OOM with one query.

2016-09-07 Thread Eduardo Alonso
onso > > Vía de las dos Castillas, 33, Ática 4, 3ª Planta > > 28224 Pozuelo de Alarcón, Madrid > > Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd > > <https://twitter.com/StratioBD>* > > > > 2016-09-06 15:15 GMT+02:00 Eduardo Alonso : > > > >

Re: OOM with one query.

2016-09-06 Thread Benjamin Lerer
Alarcón, Madrid > Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd > <https://twitter.com/StratioBD>* > > 2016-09-06 15:15 GMT+02:00 Eduardo Alonso : > >> Hi to all: >> >> I think i have found a bug, serious one. >> >> I have found a INSERT qu

Re: OOM with one query.

2016-09-06 Thread Eduardo Alonso
> I think i have found a bug, serious one. > > I have found a INSERT query that does not validate the params and accept > an String as a valid value for a List. This produce an out of > memory exception due to java heap in the server. > > I have coded a very simple maven projec

OOM with one query.

2016-09-06 Thread Eduardo Alonso
Hi to all: I think i have found a bug, serious one. I have found a INSERT query that does not validate the params and accept an String as a valid value for a List. This produce an out of memory exception due to java heap in the server. I have coded a very simple maven project in java to

Re: unchecked_tombstone_compaction - query

2015-10-15 Thread Paulo Motta
Hello Deepak, The dev@cassandra list is exclusive for development announcements and discussions, so I will reply to users@cassandra as someone else might have a similar question. Basically, there is pre-check, that defines which sstables are eligible for single-sstable tombstone compaction, and a

Slice Query Issue

2013-12-09 Thread mahesh rajamani
Hi, I am using Cassandra 2.0.2 version. We are using wide rows (approx. 5000 columns). During the process, I mark few columns in the row with ttl(Sometimes 100 continues columns) and delete them. In this scenario if do slice query on that record, some slices are not returning. Also, if do a get

Does Cassandra support parallel query processing?

2012-05-20 Thread Majid Azimi
hi guys, I'm going to build a warehouse with Cassandra. There are a lot of range and aggregate queries. Does Cassandra support parallel query processing?(both on single box and cluster)

Re: implementation choice with regard to multiple range slice query filters

2012-04-03 Thread David Alves
cool, thanks. -david On Apr 4, 2012, at 1:01 AM, Jonathan Ellis wrote: > You need more than column_index_size_in_kb worth of column data for it > to generate row header index entries. We have a cassandra.yaml in > test/conf that sets that extra low, to 4, to make that easier. "ant > test" sets

Re: implementation choice with regard to multiple range slice query filters

2012-04-03 Thread Jonathan Ellis
You need more than column_index_size_in_kb worth of column data for it to generate row header index entries. We have a cassandra.yaml in test/conf that sets that extra low, to 4, to make that easier. "ant test" sets up the environment to point to that yaml, but if you're running it from your IDE

Re: implementation choice with regard to multiple range slice query filters

2012-04-03 Thread David Alves
Hi Jonathan: Thanks for the tip. Although the first option I proposed would not incur in that penalty it would not take advantage of the columns index for the middle ranges. On a related matter, I'm struggling to test the IndexedBlockFetcher implementation (SimpleBlockF

Re: implementation choice with regard to multiple range slice query filters

2012-04-02 Thread Jonathan Ellis
That would work, but I think the best approach would actually push multiple ranges down into ISR itself, otherwise you could waste a lot of time reading the row header redundantly (the skipBloomFilter/deserializeIndex part). The tricky part would be getting IndexedBlockFetcher to not do extra work

implementation choice with regard to multiple range slice query filters

2012-04-02 Thread David Alves
Hi guys I'm a PhD student and I'm trying to dip my feet in the water wrt to cassandra development, as I'm a long time fan. I'm implementing CASSANDRA-3885 which pertains to supporting returning multiple slices of a row. After looking around at the portion of the

Re: digest query: why relying on value?

2012-04-02 Thread Nicolas Romanetti
Right on spot thanks! It would be interesting to have some metrics on how rare is the case: // break ties by comparing values. if (timestamp() == column.timestamp()) return value().compareTo(column.value()) < 0 ? column : this; If extremely rare, it would be may be mo

Re: digest query: why relying on value?

2012-04-02 Thread Sylvain Lebresne
A digest query is about making 1 digests for many columns, not 1 digest per column. If it were 1 digest per column, then yes, the timestamp would be an option. -- Sylvain On Mon, Apr 2, 2012 at 4:25 PM, Jonathan Ellis wrote: > Look at Column.reconcile. > > On Mon, Apr 2, 2012 a

Re: digest query: why relying on value?

2012-04-02 Thread Jonathan Ellis
Look at Column.reconcile. On Mon, Apr 2, 2012 at 9:17 AM, Nicolas Romanetti wrote: >  Hello, > > Why does the digest read response include a hash of the column value? Isn't > the timestamp sufficient? > > May be an answer: > Is the value hash computed to cope with (I presume rare) race condition

digest query: why relying on value?

2012-04-02 Thread Nicolas Romanetti
Hello, Why does the digest read response include a hash of the column value? Isn't the timestamp sufficient? May be an answer: Is the value hash computed to cope with (I presume rare) race condition scenario where 2 nodes would end up with same col. name and same col. timestamp but with a diffe

Re: Cassandra super column query

2012-02-16 Thread krishna melkote
Hello, I have recently started using Cassandra. I have designed a super column as below. It is basically a set of university results. I am using Hector java API to interface with Cassandra. I am struggling a bit to find what is the proper way to query the results for a particular registration

Re: Can we do a more precise query in Cassandra ?

2010-03-25 Thread Jonathan Ellis
This will have to wait until we have secondary index support, at the least. (https://issues.apache.org/jira/browse/CASSANDRA-749) 2010/3/25 郭鹏 : > Hi All: > > I am thinking a more precise query in Cassandra: > > Could we hava a query API like this : > > List> get_slice_c

Can we do a more precise query in Cassandra ?

2010-03-25 Thread 郭鹏
Hi All: I am thinking a more precise query in Cassandra: Could we hava a query API like this : List> get_slice_condition(String keyspace, List keys, ColumnParent column_parent, Map queryConditions, int consistency_level) So we could use this API to query more precise data like age colum