Secondary index creation simultaneously

2017-12-07 Thread Jose Rio Perandones
Hi all,

We are working with cassandra 3.0.14 using secondary indexes and we realized 
that it's not possible to create several index simultaneously, in fact you can 
send several "CREATE CUSTOM INDEX..." statements but cassandra doesn't start to 
build one index until the previous one has been completely built.

I've been analyzing how index creation is managed by Cassandra and I found the 
following:
SecondaryIndexManager class uses an "asynExecutor" to execute the 
initialization index task, the asynExecutor object is initialized this way:
private static final ExecutorService asyncExecutor =
new JMXEnabledThreadPoolExecutor(1,
 StageManager.KEEPALIVE,
 TimeUnit.SECONDS,
 new LinkedBlockingQueue<>(),
 new 
NamedThreadFactory("SecondaryIndexManagement"),
 "internal");

So the asynExecutor is being created with a corePoolSize=1 and 
maximumPoolSize=1 and this is the reason while the index creation tasks are 
being serialized.
I've been able to change coreThreads and maximumThreads for this executor via 
jmx but I haven't found any other way to modify the size of this pool and I 
don't know the reason why this pool is set with "fixed" size 1.

Is it possible to change the pool size in any other way?
Could there be a concurrency problem if several index are created in parallel 
and is this the reason why the pool size is set to 1?

Thanks in advance.
Best regards.


Some problems with Apache Cassandra

2017-12-07 Thread v.elis...@rubic.pro

Hello, my name is Vladimir and I have questions on Cassandra DMBS.


*1 PROBLEM. How to increase the speed?*
I installed this system from the repository:
"deb http://www.apache.org/dist/cassandra/debian 311x main"
With standard settings, the system runs slowly.
_Comparison with "MySQL"._
select count (*) from login_wifly.radacct;
+ -- +
| | count (*) |
+ -- +
| | 9810806 |
+ -- +

real 0m3.709s - speed in the MySQL (without caching)
user 0m0.000s
sys 0m0.000s

_In Cassandra:_
SELECT count(*) FROM test.radacct;

 count
-
 9810806

(1 rows)

Warnings :
Aggregation query used without partition key


real    3m7.661s  speed in the Cassandra
user    0m0.444s
sys 0m0.056s


*2 PROBLEM. How to design a table correctly?*
My test table configure:
create table radacct2 (
radacctid bigint,
acctsessionid text,
acctuniqueid text,
username text,
groupname text,
realm text,
nasid text,
nasipaddress text,
nasportid text,
nasporttype text,
acctstarttime text,
acctstoptime text,
acctsessiontime bigint,
acctauthentic text,
connectinfo_start text,
connectinfo_stop text,
acctinputoctets bigint,
"acctoutputoctets" bigint,
"calledstationid" text,
callingstationid text,
acctterminatecause text,
servicetype text,
framedprotocol text,
framedipaddress text,
acctstartdelay bigint,
acctstopdelay bigint,
xascendsessionsvrkey text,
client bigint,
method text,
zone bigint,
localDateStart text,
localDateStop text,
localDateTimeStart text,
localDateTimeStop text,
msisdn text,
PRIMARY KEY (radacctid, username)
) WITH CLUSTERING ORDER BY (username DESC);

I need do select to the "username" field.
In MySQL it looks like this:
SELECT
a.username,
COUNT (DISTINCT a.username)
FROM
radacct as a
WHERE
(LENGTH (a.username) = 17)
GROUP BY
a.username;

When I execute the query "SELECT username, count (*) FROM radacct GROUP 
BY username;
InvalidRequest: Error from server: code = 2200 [Invalid query] message = 
"PRIMARY KEY, got username"


*3 PROBLEM. How to improve performance using a competent configuration?*

cat /etc/cassandra/cassandra.yaml
What parameters can be adjusted to achieve maximum effect?



Re: Some problems with Apache Cassandra

2017-12-07 Thread ilyail3 K
You probably want to ask that question in the user's mailing list rather
than dev

https://www.mail-archive.com/user@cassandra.apache.org/

On Dec 7, 2017 10:12 AM, "v.elis...@rubic.pro"  wrote:

> Hello, my name is Vladimir and I have questions on Cassandra DMBS.
>
>
> *1 PROBLEM. How to increase the speed?*
> I installed this system from the repository:
> "deb http://www.apache.org/dist/cassandra/debian 311x main"
> With standard settings, the system runs slowly.
> _Comparison with "MySQL"._
> select count (*) from login_wifly.radacct;
> + -- +
> | | count (*) |
> + -- +
> | | 9810806 |
> + -- +
>
> real 0m3.709s - speed in the MySQL (without caching)
> user 0m0.000s
> sys 0m0.000s
>
> _In Cassandra:_
> SELECT count(*) FROM test.radacct;
>
>  count
> -
>  9810806
>
> (1 rows)
>
> Warnings :
> Aggregation query used without partition key
>
>
> real3m7.661s  speed in the Cassandra
> user0m0.444s
> sys 0m0.056s
>
>
> *2 PROBLEM. How to design a table correctly?*
> My test table configure:
> create table radacct2 (
> radacctid bigint,
> acctsessionid text,
> acctuniqueid text,
> username text,
> groupname text,
> realm text,
> nasid text,
> nasipaddress text,
> nasportid text,
> nasporttype text,
> acctstarttime text,
> acctstoptime text,
> acctsessiontime bigint,
> acctauthentic text,
> connectinfo_start text,
> connectinfo_stop text,
> acctinputoctets bigint,
> "acctoutputoctets" bigint,
> "calledstationid" text,
> callingstationid text,
> acctterminatecause text,
> servicetype text,
> framedprotocol text,
> framedipaddress text,
> acctstartdelay bigint,
> acctstopdelay bigint,
> xascendsessionsvrkey text,
> client bigint,
> method text,
> zone bigint,
> localDateStart text,
> localDateStop text,
> localDateTimeStart text,
> localDateTimeStop text,
> msisdn text,
> PRIMARY KEY (radacctid, username)
> ) WITH CLUSTERING ORDER BY (username DESC);
>
> I need do select to the "username" field.
> In MySQL it looks like this:
> SELECT
> a.username,
> COUNT (DISTINCT a.username)
> FROM
> radacct as a
> WHERE
> (LENGTH (a.username) = 17)
> GROUP BY
> a.username;
>
> When I execute the query "SELECT username, count (*) FROM radacct GROUP BY
> username;
> InvalidRequest: Error from server: code = 2200 [Invalid query] message =
> "PRIMARY KEY, got username"
>
> *3 PROBLEM. How to improve performance using a competent configuration?*
>
> cat /etc/cassandra/cassandra.yaml
> What parameters can be adjusted to achieve maximum effect?
>
>


Apache Cassandra Wiki access

2017-12-07 Thread Russell Bateman
It appears that deeper access to the wiki is available for the asking? 
https://wiki.apache.org/cassandra/FrontPage states that, "most of the 
information on this Wiki is being deprecated." Is this already done? 
Please advise.


If so, please grant this to me. I don't know that I have a "wiki 
username". If I need one, and need to give it to you, please choose from:


   my e-mail address
   russell.bateman
   windofkeltia


Note: I'm specifically looking to write a custom/secondary index 
plug-in, similar to Stratio's Lucene index.


Thanks,

Russ


Re: Apache Cassandra Wiki access

2017-12-07 Thread Jon Haddad
The wiki is effectively dead.

Please contribute to the in tree docs section: 
https://github.com/apache/cassandra/tree/trunk/doc 


I recently merged in an improvement that uses Docker to generate the docs.  The 
short version:

cd ./doc

# build the Docker image
docker-compose build build-docs

# build the documentation
docker-compose run build-docs

Jon

> On Dec 7, 2017, at 11:25 AM, Russell Bateman  wrote:
> 
> It appears that deeper access to the wiki is available for the asking? 
> https://wiki.apache.org/cassandra/FrontPage states that, "most of the 
> information on this Wiki is being deprecated." Is this already done? Please 
> advise.
> 
> If so, please grant this to me. I don't know that I have a "wiki username". 
> If I need one, and need to give it to you, please choose from:
> 
>   my e-mail address
>   russell.bateman
>   windofkeltia
> 
> 
> Note: I'm specifically looking to write a custom/secondary index plug-in, 
> similar to Stratio's Lucene index.
> 
> Thanks,
> 
> Russ