This Article collects all the posts under the Measuring/Billing database usage in StratosLive.
My Job
WSO2 Data Services Server User Guide
Need to find I/O rates, bandwidth used by each Database user
Limiting The Resource Use
I continued
Suggestions and replies
Collecting and summarizing the captured data
Followed the BAM samples.
Do you need data to play with?
Prototype version 1
Prototype version 1 has to be verified.
1st Verification
OSGi Services
Publishing to BAM
Using OSGi console to debug things
[Break for Test Automation]
Back to the Frozen project
WSO2 Storage Server
The Inevitable Change
Strange things do happen
Using Hive Scripts to Analyze and Summarize BAM data
Difference between two time ignoring the date
Replacing for ntask(quartz-scheduler), using timer task
It is almost 'THE END'
Showing posts with label mysql. Show all posts
Showing posts with label mysql. Show all posts
Friday, November 30, 2012
Measuring/Billing database usage in StratosLive - Summery
Labels:
bam,
BAM2,
billing,
database,
database size,
mysql,
remote debugging,
StratosLive RSS,
usage agent,
WSO2 Data Services Server
Thursday, November 15, 2012
Hive & Me Part 2
Continued from Hive & Me Part 1.....
After creating the required MySQL and Hive tables I moved in to the logic part of the script. I have to get the sum of all the bandwidth-In and bandwidth-Out entries separately. Then sum (bandwidth-In)-sum (bandwidth-Out) will give the current value and sum (bandwidth-Out) will give the history value. But doing it hourly is extremely costly. But if we can sum the entries from the last hour and calculate the current and history values based on the early current and history values, it will be better. I got to know we are keeping the time of the last run of the script in a MySQL table, and we write it to the hive configuration file using a Java class. I used that value to get the sum of the entries in the last hour. But it is not possible to add this last hour summarization to the previous current, and history values in the same query. So I add the summarization of the last hour with new id and sum the final and last hour rows in the table.
Above script get the summery of the usage in the last hour and inset it in to the table. Below query add the last our summary to final(in the last hour) and create the final value for the current hour.
This query results in a MySQL table where each tenant has two rows as 'final' and 'last hour'. Final row gives the current (size of all the data that user currently have in his directory) and history (size of all the data that user has deleted up to now). This information should be available to for each tenant correct to the last hour.
After creating the required MySQL and Hive tables I moved in to the logic part of the script. I have to get the sum of all the bandwidth-In and bandwidth-Out entries separately. Then sum (bandwidth-In)-sum (bandwidth-Out) will give the current value and sum (bandwidth-Out) will give the history value. But doing it hourly is extremely costly. But if we can sum the entries from the last hour and calculate the current and history values based on the early current and history values, it will be better. I got to know we are keeping the time of the last run of the script in a MySQL table, and we write it to the hive configuration file using a Java class. I used that value to get the sum of the entries in the last hour. But it is not possible to add this last hour summarization to the previous current, and history values in the same query. So I add the summarization of the last hour with new id and sum the final and last hour rows in the table.
INSERT INTO TABLE REGISTRY_USAGE_HOURLY_ANALYTICS
SELECT concat(TID, "LastHour"), TID, HISTORY_USAGE, CURRUNT-HISTORY_USAGE FROM
(SELECT TENANT_ID AS TID,
sum(PAYLOAD_VALUE) AS HISTORY_USAGE
FROM USAGE_STATS_TABLE
WHERE USAGE_STATS_TABLE.PAYLOAD_TYPE = 'ContentBandwidth-Out' AND Timestmp > ${hiveconf:last_hourly_ts}
GROUP BY SERVER_NAME, PAYLOAD_TYPE, TENANT_ID) table1
JOIN
(SELECT TENANT_ID AS TID2,
sum(PAYLOAD_VALUE) AS CURRUNT
FROM USAGE_STATS_TABLE
WHERE USAGE_STATS_TABLE.PAYLOAD_TYPE = 'ContentBandwidth-In' AND Timestmp > ${hiveconf:last_hourly_ts}
GROUP BY SERVER_NAME, PAYLOAD_TYPE, TENANT_ID) table2
ON(table2.TID2 = table1.TID);
Above script get the summery of the usage in the last hour and inset it in to the table. Below query add the last our summary to final(in the last hour) and create the final value for the current hour.
INSERT INTO TABLE REGISTRY_USAGE_HOURLY_ANALYTICS
SELECT concat(TENANT_ID, "Final"),
TENANT_ID,
sum(HISTORY_USAGE) as HISTORY_USAGE,
sum(CURRENT_USAGE) as CURRENT_USAGE
FROM REGISTRY_USAGE_HOURLY_ANALYTICS
GROUP BY TENANT_ID
This query results in a MySQL table where each tenant has two rows as 'final' and 'last hour'. Final row gives the current (size of all the data that user currently have in his directory) and history (size of all the data that user has deleted up to now). This information should be available to for each tenant correct to the last hour.
Labels:
BAM2,
hive,
JDBC Storage Handler for Hive,
mysql
Hive & Me Part 1
Started with the new project to summarize registry bandwidth data (refers to the space used in the registry). As you might know we can have BAM to summarize data in
Cassandra key spaces using hive scripts. It was not easy to work with lack of
examples under hive.
What I have to do
There was a table in Cassandra that contains registry usage data. When a user adds or remove something from his registry a entry is marked as “registryBandwidth-In” (when we adds something) or “registryBandwidth-Out”(when he deletes something). I have to summarize those recodes in such a way that we have access to the current (size of all the data that user currently have in his directory) and history (size of all the data that user has deleted up to now). This information should be available to for each tenant correct to the last hour.
Implementation Plan
If I can write the current and history values in to a MySQL table, where each tenant will have a separate row, it is good enough. First I thought of having a table in hive with current and history values and a MySQL table mapped to it.
Below code uses the JDBC Storage Handler for Hive and more information on how to use it can be found in Kasun's blog: http://kasunweranga.blogspot.com/2012/06/jdbc-storage-handler-for-hive.html
CREATE EXTERNAL TABLE IF NOT EXISTS REGISTRY_USAGE_HOURLY_ANALYTICS (
ID STRING,
TENANT_ID STRING,
HISTORY_USAGE BIGINT,
CURRENT_USAGE BIGINT)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES (
"mapred.jdbc.driver.class" = "com.mysql.jdbc.Driver",
"mapred.jdbc.url" = "jdbc:mysql://localhost:3306/WSO2USAGE_DB",
"mapred.jdbc.username" = "root",
"mapred.jdbc.password" = "root",
"hive.jdbc.update.on.duplicate" = "true",
"hive.jdbc.primary.key.fields" = "ID",
"hive.jdbc.table.create.query" = "CREATE TABLE REGISTRY_USAGE_HOURLY_ANALYTICS (
ID VARCHAR(50),
TENANT_ID VARCHAR(50),
HISTORY_USAGE BIGINT,
CURRENT_USAGE BIGINT)"
);
This will create a 2 tables, One is a Hive table and other is a mySQL table. Both will have the name "REGISTRY_USAGE_HOURLY_ANALYTICS" What ever we write the to the hive table will be written to the MySQL table. In the next code block I create a mapping to the MySQL table. Using this temporary hive table I can query the MySQL table.
CREATE EXTERNAL TABLE IF NOT EXISTS REGISTRY_USAGE_HOURLY_ANALYTICS_TEMP (
ID STRING,
TENANT_ID STRING,
HISTORY_USAGE BIGINT,
CURRENT_USAGE BIGINT)
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' TBLPROPERTIES (
"mapred.jdbc.driver.class" = "com.mysql.jdbc.Driver",
"mapred.jdbc.url" = "jdbc:mysql://localhost:3306/WSO2USAGE_DB",
"mapred.jdbc.username" = "root",
"mapred.jdbc.password" = "root",
"hive.jdbc.primary.key.fields" = "TENANT_ID",
"mapred.jdbc.input.table.name" = "REGISTRY_USAGE_HOURLY_ANALYTICS"
);
Continued to the part 2....
Saturday, June 30, 2012
1st Verification – Verification of “index length + data length = actual disk space used”
I think I found where DBs are stored.
As I am working in XAMPP there were stored in /opt/lampp/var/mysql
there where folders for each database. When tried to open them it
gives me a error saying “you don't have permission” Next thing I
tried was trying to open it in the terminal using sudo. Unlucky me it
gave me a error saying “sudo: cd:
command not found”. Have to find a way to open such folders.
I'll write a post on this, If I find a way (Or any one who know a way
can help me by commenting below). Till then I used “sudo
nautilus path/”
There were 3 files for each table in
each db. A .frm file, .myd file, .myi file. All three are
contributing to the space used
- FRM files contain table structure information.
- MYD contains the actual table data.
- MYI files contain the indexes.
-s, --summarize (display only a total for each argument)
-h, --human-readable ( print sizes in human readable format (e.g., 1K 234M 2G))
so It showed that figures given in the information schema is similar to the results given by the above command. But size of the FRM file was not take in to account by me. But looks like all the .frm files are having the same length (12K) so I can sum them up If I know the number of tables. However I have to check why others ignore this file when they are calculating.
Some databases only have .frm file
Going through the database folders, I saw that some folders only have a .frm file in it. Searching for that I found out that there is two major engines within MySQL: MyISAM and InnoDB. If the table belongs to InnoDB, it only includes .frm file. Data of that kind of DBs are stored in a single or multiple (you can configure that) .idb files.
Labels:
database,
database size,
mysql,
wso2,
WSO2 Data Services Server
Prototype version 1 has to be verified.
This prototype is tested on my own
machine with several MySQL users created by me. So this is not
guaranteed to work in the real scenario. This should work with the
actual billing architecture, to be any use for my work. This is the
second and biggest verification that I have to do.
Before that there is a simple and basic
verification that I have to consider. In prototype code I calculate a
table size as follows (here information from
information_schema.TABLES is used)
Table Size = Data Length + Index Length
Both Data Length and Index Length can
be found in information_schema.TABLES. I have to verify that figure
given by the above calculation is the correct table size.
Prototype version 1
As I was asked to build a proof of
concept(POC), as I mentioned in before posts, I started working on
it. By now I have a working prototype that logs the DB size
information. Here I will create two tables one contain user details
like user name, user DB size limit and binary information column
'exceeded' that will indicate whether subject user has exceeded his
limits (this column is not used feather in my currant implementation,
but I thought it would be helpful to have such a column in future).
Second table is a logging table, this
logs the disk space used by each user, time to time. This again has 3
columns. A time stamp, user name and exceeded number of bytes (used
number of bytes-user limit in bytes) are the information that is
included in that table.
Below is the simple code that do that
job
include code here.
import java.sql.*;
public class sizeCheck {
public static void main(String args[]) {
Connection con = null;
Statement stat, stat2;
boolean debug = true;
try {
Class.forName("com.mysql.jdbc.Driver").newInstance();
con = DriverManager.getConnection("jdbc:mysql://localhost/", "root", "root");
if (debug && !con.isClosed())
System.out.println("Successfully connected to " + "MySQL server using TCP/IP...");
stat = con.createStatement();
stat2 = con.createStatement();
// query to select all of the data from a table
String selectQuery = "SELECT user,sizelimit FROM `Quota`.`Quota`";
// get the results
ResultSet results = stat.executeQuery(selectQuery);
// output the results
while (results.next()) {
String getDBsizes =
"SELECT SUM(DATA_LENGTH) as sumData, SUM(INDEX_LENGTH) as sumIndex FROM `information_schema`.`TABLES` WHERE TABLE_SCHEMA LIKE '" +
results.getString("user") + "%'";
ResultSet sizeResult = stat2.executeQuery(getDBsizes);
sizeResult.next();
if (debug) {
System.out.println(sizeResult.getInt("sumData"));
System.out.println(sizeResult.getInt("sumIndex"));
System.out.println(sizeResult.getInt("sumData") + sizeResult.getInt("sumIndex"));
}
if (sizeResult.getInt("sumData") + sizeResult.getInt("sumIndex") > results.getInt("sizelimit")) {
int exccededGB =
sizeResult.getInt("sumData") + sizeResult.getInt("sumIndex") -
results.getInt("sizelimit");
String setExceeded =
"UPDATE `Quota`.`Quota` SET `exceeded` = '1' WHERE `Quota`.`user` = '" +
results.getString("user") + "';";
stat2.executeUpdate(setExceeded);
String logExceeded =
"INSERT INTO `Quota`.`exceed` (`datetime` ,`user` ,`exceedbytes` ,`other`) VALUES (CURRENT_TIMESTAMP , '" +
results.getString("user") +
"', '" +
exccededGB +
"', '');";
stat2.execute(logExceeded);
// String revoke =
// "REVOKE INSERT, UPDATE, CREATE, ALTER ON `"+
// results.getString("user") +"\\_%` . * FROM '"+
// results.getString("user") +"'@'localhost';";
// if(debug)
// System.out.println(revoke);
// stat2.execute(revoke);
} else {
String setExceeded =
"UPDATE `Quota`.`Quota` SET `exceeded` = '0' WHERE `Quota`.`user` = '" +
results.getString("user") + "';";
stat2.executeUpdate(setExceeded);
// String grant = "GRANT ALL PRIVILEGES ON `"+
// results.getString("user") +"\\_%` . * TO '"+
// results.getString("user") +"'@'localhost';";
// if(debug)
// System.out.println(grant);
// stat2.execute(grant);
}
}
} catch (Exception e) {
System.err.println("Exception: " + e.getMessage());
} finally {
try {
if (con != null)
con.close();
} catch (SQLException e) {
}
}
}
}
you can download it form below
Monday, June 11, 2012
Do you need data to play with?
If you are doing any data related work,
you need some big data to work with. As a example I am working with
mySQL, and I want to try a big query that will populate my 'SLOW
QUERY LOG'. To get in to the slow query log that query must run for
some time (although we can define that time, my tables were to small
to produce at least a 1ms). I needed some big tables that will
populate my databases with some big data.
Thats where I found the following .sql
file that have lot of data in it. This is some kind of a sampple
given by mySQL. Anyway here it is.
https://docs.google.com/a/wso2.com/file/d/0B4VQdLMBav1WTXQ3cnY5RHdWWkE/edit
go to it and save it..
Labels:
mysql,
sample sql data,
SLOW_QUERY_LOG,
wso2
Who thought that installing MySQL will that that long?
I wanted to see that list of users, so
as most of you know it is done by ,
select *
from mysql.user;
But this was very tedious, so I thought
I should go to a GUI based server like PHPmyadmin. SO I thought of
installing WAMP like thing here. As I am working in ubuntu there is
no WAMP, I have to work with XAMPP, which is very similar.
It was a easy job with installing it.
Just go to http://www.apachefriends.org/en/xampp-linux.html
and follow there 4 steps and you will be done.
OR
use below
Then When I tried to start the servers
it gave me the following massege
XAMPP
is currently only availably as 32 bit application. Please use a 32
bit compatibility library for your system.
To
fix that problem I had to install 32 bit libraries (which took
several minutes to download [<70MB]) to my computer. I got help
from following site (
http://preprocess.me/comment/386)
you
have to run the below command to install 23 bit libs
sudo
apt-get install ia32-libs
But when I started PHPmyadmin it gave
me this error.
#2002 -
The server is not responding (or the local MySQL server's socket is
not correctly configured)
I always knew that something like this
will popup as I had a already installed mySQL server in my computer.
So I thought of removing that existing
mySQL sever. This page helped me in that process.
A little problem came with the
commands. When copying and pasting it to he bash. So Here are the
corrected ones.
apt-get --purge remove mysql-serverapt-get --purge remove mysql-client
apt-get --purge remove mysql-common
apt-get autoremove
apt-get autoclean
//befor purge it should be -- not –
Now restart the server and Everything
will work fine.
Saturday, June 2, 2012
Suggestions and replies
With my suggestion, others started to
give their feedback. There were few interesting and informative
feedbacks as listed below.
Sanjeewa Malalgoda: I was
looking into this subject some times back and found some points.
AFAIU only number of transactions is not enough .found some
interesting tools like dbtuna[1] and jetprofiler[2] I hope we can
have a look at them and get some idea. I have tested jetprofiler and
it gives
lot of information related the db
usage.
I went through them and got to this
below conclusion which I made as my reply.
“It is always good to know a person
who has worked in the same area. I went through those 2 tools and
they mainly target the management and administrative aspects of the
db server. It gives us nice graphical representations about existing
data. This can be very useful when understanding the usage patterns
of users. But this do not give us any new informations, this only
presents the date found in the information_schema, logs. As you have
used it you might know more about it, so correct me if I am wrong.”
Jet Profiler
- Install Java 1.6 separately
- Unzip jetprofiler_v2.0.5.zip to the desired folder (e.g. /usr/local/bin/jetprofiler or /home/USER/bin/jetprofiler).
- Run ./jetprofiler
Labels:
mysql,
MySQL log files,
SLOW_QUERY_LOG,
wso2,
WSO2 Data Services Server
I continued
So my last day suggestion way not up to
the standard, so Amilam asked me to restructure it, so this is the
version 2
Problem:
Until now we don't have a way to measure the usage of the StratosLive RSS. There can be many views of the usage like I/O per unit time, accumulated DB size per user, Bandwidth used by a user (i think this is taken in to account by now). We need to start measuring(tracking) the usage at least under one view (ideally in all). Real problem comes with the limitations of mySQL. In mySQL, it do not have in built support for above mention things.
we cannot limit DB space: http://www.howtoforge.com/forums/showthread.php?t=1944
we cannot limit/measure bandwidth/IO rate of a individual database/user : http://forums.mysql.com/read.php?10,543847,543847
There is a way to set the MAX_QUERIES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, but there is no way to get the current state of those variables (without messing with mySQL code base, http://forums.mysql.com/read.php?35,179219)
Solutions:
There are external suggestions for limiting database size, we can do something like that, as a example,
MySQL Quota Daemon: http://lrem.net/software/mysql-quota-daemon.xhtml
MySQL Quota-Tool: http://projects.marsching.org/mysql_quota/
And we can limit the user to a MAX_QUERIES_PER_HOUR using in built support in mySQL (http://dev.mysql.com/doc/refman/5.5/en/grant.html). This limit will not be a problem to the user while service is protected from getting extreme number of request per unit time (kind of DOS)
And there are suggested ways to limit the DB size per user, using cronjobs that will notify the user of excess usage of space.
How we can use what we have:
1. We can limit the size of the DB at the time of DB creation based on the usage plan.
2. We can define transactions per hour for all or selected usage plans
Please reply with your feedback on this view, it will be very helpful...
Until now we don't have a way to measure the usage of the StratosLive RSS. There can be many views of the usage like I/O per unit time, accumulated DB size per user, Bandwidth used by a user (i think this is taken in to account by now). We need to start measuring(tracking) the usage at least under one view (ideally in all). Real problem comes with the limitations of mySQL. In mySQL, it do not have in built support for above mention things.
we cannot limit DB space: http://www.howtoforge.com/forums/showthread.php?t=1944
we cannot limit/measure bandwidth/IO rate of a individual database/user : http://forums.mysql.com/read.php?10,543847,543847
There is a way to set the MAX_QUERIES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, but there is no way to get the current state of those variables (without messing with mySQL code base, http://forums.mysql.com/read.php?35,179219)
Solutions:
There are external suggestions for limiting database size, we can do something like that, as a example,
MySQL Quota Daemon: http://lrem.net/software/mysql-quota-daemon.xhtml
MySQL Quota-Tool: http://projects.marsching.org/mysql_quota/
And we can limit the user to a MAX_QUERIES_PER_HOUR using in built support in mySQL (http://dev.mysql.com/doc/refman/5.5/en/grant.html). This limit will not be a problem to the user while service is protected from getting extreme number of request per unit time (kind of DOS)
And there are suggested ways to limit the DB size per user, using cronjobs that will notify the user of excess usage of space.
How we can use what we have:
1. We can limit the size of the DB at the time of DB creation based on the usage plan.
2. We can define transactions per hour for all or selected usage plans
Please reply with your feedback on this view, it will be very helpful...
Labels:
mysql,
StratosLive RSS,
wso2,
WSO2 Data Services Server
Limiting The Resource Use
Resuming from where I stopped
yesterday, I started by looking how to limit the resource use.
Current state of the MAX_QUERIES_PER_HOUR, MAX_UPDATES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, MAX_USER_CONNECTIONS variables.
Yes, I can limit it, but the ultimate
goal of this project is to measure the resource use. So it is always
better to have something that can do my prime goal. So I needed to
know whether I can get any information about the current state of
MAX_QUERIES_PER_HOUR, MAX_UPDATES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR,
MAX_USER_CONNECTIONS variables.
I checked in MYSQL metadata
schema(information_schema, performance_schema)
If mySQL can limit
it, it should have a count somewhere right?. That was how I thought,
so I started looking through some default schema like
information_schema, performance_schema, etc. Sorry to say that
I found nothing. But still I think it should save this data
somewhere.
Limiting the database size (MySQL Quota Daemon)
I went though that library I found last day (MySQL Quota Daemon : http://lrem.net/software/mysql-quota-daemon.xhtml) and it is a good tool (it requires perl). Another very small piece of code for doing the same thing was MySQL Quota-Tool (http://projects.marsching.org/mysql_quota/)
Labels:
mysql,
wso2,
WSO2 Data Services Server
Subscribe to:
Comments (Atom)