biguous.
> >>>
> >>> If we object to allowing expr embedding of a subset of the Lucene syntax,
> >>> I can't imagine we're okay w/ then jamming a subset of that syntax into
> >>> the main CQL grammar.
> >>>
> >>>
ubset of that syntax into the
>>> main CQL grammar.
>>>
>>> If we want to do this in non-expr CQL space, I think using functions
>>> (ignoring the implementation complexity) at least removes ambiguity.
>>> "token_match", "phrase_
o this in non-expr CQL space, I think using functions
> > (ignoring the implementation complexity) at least removes ambiguity.
> > "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be
> > pretty clear, althou
Yep, this sounds like the potentially least bad approach for now. Sorry Caleb, I jumped in without properly reading the thread and assumed we were proposing changes to CQL.If it’s clear we’re dropping into a sub-language and providing a sub-query to it that’s SAI-specific, that gives us pretty
want to do this in non-expr CQL space, I think using functions
> (ignoring the implementation complexity) at least removes ambiguity.
> "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be
> pretty clear, although the
mentation complexity) at least removes ambiguity.
"token_match", "phrase_match", "token_like", "=", and "LIKE" would all be
pretty clear, although there may be other problems. For instance, what
happens when I try to use "token_match" on an inde
ugh it seems to have
>> some
>> > > differences from the feature set in SAI.
>> > > >>
>> > > >> That said, there seems to be enough of an overlap that it would
>> make
>> > > sense to consider using LIKE in
t; > would be a little odd if we have different syntax for different indexes.
> > >>
> > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> > >>
> > >> I think one complication here is that there seems to be a desire, that
> &g
I.
>> > > >>
>> > > >> That said, there seems to be enough of an overlap that it would
>> make
>> > > sense to consider using LIKE in the same manner, doesn't it? I think
>> it
>> > > would be a little odd if we have d
doesn't it? I think it
> > would be a little odd if we have different syntax for different indexes.
> > >>
> > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> > >>
> > >> I think one complication here is that there seems t
syntax for different
> indexes.
> > > >>
> > > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> > > >>
> > > >> I think one complication here is that there seems to be a desire,
> that
> > > I very mu
ication here is that there seems to be a desire, that
> > I very much agree with, to expose as much of the underlying flexibility of
> > Lucene as much as possible. If it means we use Caleb's suggestion, I'd ask
> > that the queries that SASI and SAI both support us
ferent indexes.
> >>
> >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> >>
> >> I think one complication here is that there seems to be a desire, that I
> >> very much agree with, to expose as much of the underlying flexibility of
>
> >>
> >> I think one complication here is that there seems to be a desire, that
> I very much agree with, to expose as much of the underlying flexibility of
> Lucene as much as possible. If it means we use Caleb's suggestion, I'd ask
> that the queries that S
at I
>> very much agree with, to expose as much of the underlying flexibility of
>> Lucene as much as possible. If it means we use Caleb's suggestion, I'd ask
>> that the queries that SASI and SAI both support use the same syntax, even if
>> it means there's t
seems to be a desire, that I very
> much agree with, to expose as much of the underlying flexibility of Lucene as
> much as possible. If it means we use Caleb's suggestion, I'd ask that the
> queries that SASI and SAI both support use the same syntax, even if it means
ere seems to be a desire, that I very
much agree with, to expose as much of the underlying flexibility of Lucene as
much as possible. If it means we use Caleb's suggestion, I'd ask that the
queries that SASI and SAI both support use the same syntax, even if it means
there's two
y isn't my favorite
choice, but it's there.
The ElasticSearch match query syntax -
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
Again, not my favorite. It's verbose, and probably too powerful for us.
ElasticSearch's documentat
general structure of the MATCH operator.
That said, I also think CONTAINS loses something important that you allude to
here Jonathan:
> with corresponding query-time tokenization and analysis. This means that the
> query term is not always a substring of the original string! Besides obvious
it creates an ambiguity when the queried column belongs to the primary key. For some queries we wouldn't know whether the user wants a primary key query using regular equality or an index query using the analyzer.`term_matches(column, term)` seems quite clear and hard to misinterpret, but it
`column = term` is definitively problematic because it creates an ambiguity
when the queried column belongs to the primary key. For some queries we
wouldn't know whether the user wants a primary key query using regular
equality or an index query using the analyzer.
`term_matches(column,
corresponding query-time
tokenization and analysis. This means that the query term is not always a
substring of the original string! Besides obvious transformations like
lowercasing, you have things like PhoneticFilter available as well.
Here are my thoughts on some of the options:
`column = term`. This
Given what was said, I propose rephrasing this functionality to limit the
memory used to execute a query. We will not expose the page size measured
in bytes to the client. Instead, an upper limit will be a guardrail so that
we won't fetch more data.
Aggregation query with grouping is a sp
/native_protocol_v5.spec#L1247-L1253
>>
>> - Clients should not rely on the actual size of the result set returned
>> to
>> decide if there are more results to fetch or not. Instead, they
>> should always
>> check the Has_more_pages flag (unless they did
ad, they should
> always
> check the Has_more_pages flag (unless they did not enable paging for
> the query
> obviously). Clients should also not assert that no result will have
> more than
> results. While the current implementation always
> respects
> the exact val
hey did not enable paging for the
> query
> obviously). Clients should also not assert that no result will have more
> than
> results. While the current implementation always
> respects
> the exact value of , we reserve the right to return
> slightly sm
As long as it is valid in the paging protocol to return a short page, but
still say “there are more pages”, I think that is fine to do that. For an
actual LIMIT that is part of the user query, I think the server must always
have returned all data that fits into the LIMIT when all pages have been
Yeah, my bad. I have paging on the brain. Seriously.
I can't think of a use-case in which a LIMIT based on # bytes makes sense from
a user perspective.
On Mon, Jun 12, 2023, at 1:35 PM, Jeff Jirsa wrote:
>
>
> On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer wrote:
>>> If you have rows that var
On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer wrote:
> If you have rows that vary significantly in their size, your latencies
>> could end up being pretty unpredictable using a LIMIT BY . Being
>> able to specify a limit by bytes at the driver / API level would allow app
>> devs to get more dete
he user in most drivers. It is simply a way to optimize your memory
> usage from end to end.
>
> I do not like the approach of using both of them simultaneously because if
> you request a page with a certain amount of rows and do not get it then is
> is really confusing and can be a pro
taneously because if
>>> you request a page with a certain amount of rows and do not get it then is
>>> is really confusing and can be a problem for some usecases. We have users
>>> keeping their session open and the page information to display page of data.
>>
then is
> is really confusing and can be a problem for some usecases. We have users
> keeping their session open and the page information to display page of data.
>
> Le lun. 12 juin 2023 à 09:08, Jacek Lewandowski <
> lewandowski.ja...@gmail.com> a écrit :
>
>> Hi,
&g
Limiting the amount of returned data in bytes in addition to the row limit
could be helpful when applied transparently by the server as a kind of
guardrail. The server could fail the query if it exceeds some
administratively imposed limit on the configuration level, WDYT?
pon., 12 cze 2023 o 11
,I was working on limiting query results by their size expressed in bytes, and some questions arose that I'd like to bring to the mailing list.The semantics of queries (without aggregation) - data limits are applied on the raw data returned from replicas - while it works fine for the row numbe
amount of rows and do not get it then is
is really confusing and can be a problem for some usecases. We have users
keeping their session open and the page information to display page of data.
Le lun. 12 juin 2023 à 09:08, Jacek Lewandowski
a écrit :
> Hi,
>
> I was working on limit
Hi,
I was working on limiting query results by their size expressed in bytes,
and some questions arose that I'd like to bring to the mailing list.
The semantics of queries (without aggregation) - data limits are applied on
the raw data returned from replicas - while it works fine for th
not supported.
>
> I also put together a demo that uses this branch to provide context to
> OpenAI’s GPT, available here: *https://github.com/jbellis/cassgpt*
> <https://github.com/jbellis/cassgpt>.
>
> Here is the query that gets executed:
>
> SELECT id, start,
handles distributed scatter/gather. Updates and deletes to vector
>> values are still not supported.
>>
>> I also put together a demo that uses this branch to provide context to
>> OpenAI’s GPT, available here: _https://github.com/jbellis/cassgpt_.
>>
scatter/gather. Updates and deletes to vector
> values are still not supported.
>
> I also put together a demo that uses this branch to provide context to
> OpenAI’s GPT, available here: https://github.com/jbellis/cassgpt.
>
> Here is the query that gets executed:
>
t;>
>>> *I propose that we adopt `ORDER BY` syntax, supporting it for vector
>>> indexes first and eventually for all SAI indexes. So this query would
>>> becomeSELECT id, start, end, text FROM
>>> {self.keyspace}.{self.table} ORDER BY embe
?
>
> Not really a common syntax, but could be useful down the line
>
> On May 23, 2023, at 12:37 AM, Mick Semb Wever wrote:
>
>
>> *I propose that we adopt `ORDER BY` syntax, supporting it for vector
>> indexes first and eventually for all SAI indexes. So this que
ver wrote:
>
>> I propose that we adopt `ORDER BY` syntax, supporting it for vector indexes
>> first and eventually for all SAI indexes. So this query would become
>>
>> SELECT id, start, end, text
>> FROM {self.keyspace}.{self.table}
>> ORD
>
>
> *I propose that we adopt `ORDER BY` syntax, supporting it for vector
> indexes first and eventually for all SAI indexes. So this query would
> becomeSELECT id, start, end, text FROM
> {self.keyspace}.{self.table} ORDER BY embedding ANN OF %s LIMIT %s*
ibuted scatter/gather. Updates and deletes
to vector values are still not supported.I also put together a demo that
uses this branch to provide context to OpenAI’s GPT, available here:
https://github.com/jbellis/cassgpt
<https://github.com/jbellis/cassgpt>. Here is the query that g
Boot camp from 2014 has a lot of fundamentals that are still quite valid:
https://www.slideshare.net/joshmckenzie/
The "Contributing to Cassandra" section of the docs can also help you get
situated: https://cassandra.apache.org/doc/latest/development/
On Wed, May 12, 2021 at 9:16 AM Manish G
wr
Hi All,
Is there any documentation someone new can look into in terms of
understanding cassandra code base?
Manish
Does range query ignore purgable tombstone (which crossed grace period) in
some cases?
On Tue, Jun 11, 2019, 2:56 PM Laxmikant Upadhyay
wrote:
> In a 3 node cassandra 2.1.16 cluster where, one node has old mutation and
> two nodes have evict-able (crossed gc grace period) tombstone produ
in.le...@datastax.com]
发送时间: 2018年8月7日 19:45
收件人: dev@cassandra.apache.org
主题: [营销类邮件] Re: Would cqlsh describe command send query to cassandra server ?
Hi,
DESCRIBE commands are handled at the driver level not at the server level.
The drivers fetch metadata and keep them up to date.
When a DESCRIBE comma
tput.
>
> So I wonder whether cqlsh fetch keyspace’s and table’s metadata just from
> local and not send query to server node ?
>
> Thanks.
>
whether cqlsh fetch keyspace’s and table’s metadata just from local
and not send query to server node ?
Thanks.
ybe
we could >really use a framework for that, I don't know.
I agree, Cassandra already has details coming out as part of metrics,
logging (like tombstones), etc.
Current log messages for (tombstone messages, large partition message, slow
query messages, etc.) are very useful, but one im
Jaydeep, thanks for taking this discussion to the dev list. I think it's
the best place to introduce new idea, discuss them in general and how
they potentially fit in. As already mention in the ticket, I do share
your assessment that we should try to improve making operational issue
more visible to
Hi,
We have worked on developing some common framework to detect/log
anti-patterns/bad queries in Cassandra. Target for this effort would be
to reduce burden on ops to handle Cassandra at large scale, as well as
help beginners to quickly identify performance problems with the Cassandra.
Initially
ld then connect to Cassandra and execute the stored
> query(ies).
>
> Given that there is no client waiting for a response, then latency is not
> even a (major) issue, so the extra network hop is probably of little
> consequence.
>
> Why would you want this to be an integral pa
Why not simply have a microservice that does this for you?
It may expose an API that allows to either store queries and/or conditions
that trigger the queries (maybe time elapsed, an alert generated,
whatever...) and it would then connect to Cassandra and execute the stored
query(ies).
Given
ite requests by the server itself.
Can you describe a little bit more on how to implement serializing the
mutation into the table?
Best,
Ke
2017-07-26 22:57 GMT-07:00 Jeff Jirsa :
>
>
> On 2017-07-26 22:19 (-0700), Ke Wang wrote:
> > Hello all,
> >
> > Is there a way
On 2017-07-26 22:19 (-0700), Ke Wang wrote:
> Hello all,
>
> Is there a way to customize Cassandra to execute a query multiple times?
>
There's always a way...
> My use case is the following. When the Cassandra server receives queries
> from remote clients, besides
Hello all,
Is there a way to customize Cassandra to execute a query multiple times?
My use case is the following. When the Cassandra server receives queries
from remote clients, besides executing those queries, the server also
stores the queries. In the future, the server can re-execute stored
onso
> > Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> > 28224 Pozuelo de Alarcón, Madrid
> > Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> > <https://twitter.com/StratioBD>*
> >
> > 2016-09-06 15:15 GMT+02:00 Eduardo Alonso :
> >
> >
Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> <https://twitter.com/StratioBD>*
>
> 2016-09-06 15:15 GMT+02:00 Eduardo Alonso :
>
>> Hi to all:
>>
>> I think i have found a bug, serious one.
>>
>> I have found a INSERT qu
> I think i have found a bug, serious one.
>
> I have found a INSERT query that does not validate the params and accept
> an String as a valid value for a List. This produce an out of
> memory exception due to java heap in the server.
>
> I have coded a very simple maven projec
Hi to all:
I think i have found a bug, serious one.
I have found a INSERT query that does not validate the params and accept an
String as a valid value for a List. This produce an out of memory
exception due to java heap in the server.
I have coded a very simple maven project in java to
Hello Deepak,
The dev@cassandra list is exclusive for development announcements and
discussions, so I will reply to users@cassandra as someone else might have
a similar question.
Basically, there is pre-check, that defines which sstables are eligible for
single-sstable tombstone compaction, and a
Hi,
I am using Cassandra 2.0.2 version. We are using wide rows (approx. 5000
columns). During the process, I mark few columns in the row with
ttl(Sometimes 100 continues columns) and delete them. In this scenario if
do slice query on that record, some slices are not returning.
Also, if do a get
hi guys,
I'm going to build a warehouse with Cassandra. There are a lot of range and
aggregate queries.
Does Cassandra support parallel query processing?(both on single box and
cluster)
cool, thanks.
-david
On Apr 4, 2012, at 1:01 AM, Jonathan Ellis wrote:
> You need more than column_index_size_in_kb worth of column data for it
> to generate row header index entries. We have a cassandra.yaml in
> test/conf that sets that extra low, to 4, to make that easier. "ant
> test" sets
You need more than column_index_size_in_kb worth of column data for it
to generate row header index entries. We have a cassandra.yaml in
test/conf that sets that extra low, to 4, to make that easier. "ant
test" sets up the environment to point to that yaml, but if you're
running it from your IDE
Hi
Jonathan: Thanks for the tip. Although the first option I proposed
would not incur in that penalty it would not take advantage of the columns
index for the middle ranges.
On a related matter, I'm struggling to test the IndexedBlockFetcher
implementation (SimpleBlockF
That would work, but I think the best approach would actually push
multiple ranges down into ISR itself, otherwise you could waste a lot
of time reading the row header redundantly (the
skipBloomFilter/deserializeIndex part).
The tricky part would be getting IndexedBlockFetcher to not do extra
work
Hi guys
I'm a PhD student and I'm trying to dip my feet in the water wrt to
cassandra development, as I'm a long time fan.
I'm implementing CASSANDRA-3885 which pertains to supporting returning
multiple slices of a row.
After looking around at the portion of the
Right on spot thanks!
It would be interesting to have some metrics on how rare is the case:
// break ties by comparing values.
if (timestamp() == column.timestamp())
return value().compareTo(column.value()) < 0 ? column : this;
If extremely rare, it would be may be mo
A digest query is about making 1 digests for many columns, not 1
digest per column. If it were 1 digest per column, then yes, the
timestamp would be an option.
--
Sylvain
On Mon, Apr 2, 2012 at 4:25 PM, Jonathan Ellis wrote:
> Look at Column.reconcile.
>
> On Mon, Apr 2, 2012 a
Look at Column.reconcile.
On Mon, Apr 2, 2012 at 9:17 AM, Nicolas Romanetti wrote:
> Hello,
>
> Why does the digest read response include a hash of the column value? Isn't
> the timestamp sufficient?
>
> May be an answer:
> Is the value hash computed to cope with (I presume rare) race condition
Hello,
Why does the digest read response include a hash of the column value? Isn't
the timestamp sufficient?
May be an answer:
Is the value hash computed to cope with (I presume rare) race condition
scenario where 2 nodes would end up with same col. name and same col.
timestamp but with a diffe
Hello,
I have recently started using Cassandra. I have designed a super column as
below. It is basically a set of university results. I am using Hector java
API to interface with Cassandra. I am struggling a bit to find what is the
proper way to query the results for a particular registration
This will have to wait until we have secondary index support, at the
least. (https://issues.apache.org/jira/browse/CASSANDRA-749)
2010/3/25 郭鹏 :
> Hi All:
>
> I am thinking a more precise query in Cassandra:
>
> Could we hava a query API like this :
>
> List> get_slice_c
Hi All:
I am thinking a more precise query in Cassandra:
Could we hava a query API like this :
List> get_slice_condition(String keyspace, List
keys, ColumnParent column_parent, Map
queryConditions, int consistency_level)
So we could use this API to query more precise data like age colum
77 matches
Mail list logo