Saturday, June 30, 2012

Remote debugging with eclipce, for WSO2 Data service Server (will work for any server).


As I am working with RSS manager , it gave me some errors. Only way I knew to debug this is using print statements. But with sysouts I couldn't figure out the error. I had a small doubt whether one variable is not initialized. So need a way to check this, I found this remote debugging option which felt like home. It was like debugging a small app on the IDE.

This is how you can do it.

Go to debug configuration (just right click in the project Debug As>Debug configuration)
There go to 'Remote Java Application'
fill it like below


Run the server with given port as debug port

./wso2server.sh --debug 8000

put some breakpoints and you are good to go...

1st Verification – Verification of “index length + data length = actual disk space used”


I think I found where DBs are stored. As I am working in XAMPP there were stored in /opt/lampp/var/mysql there where folders for each database. When tried to open them it gives me a error saying “you don't have permission” Next thing I tried was trying to open it in the terminal using sudo. Unlucky me it gave me a error saying “sudo: cd: command not found”. Have to find a way to open such folders. I'll write a post on this, If I find a way (Or any one who know a way can help me by commenting below). Till then I used “sudo nautilus path/”

There were 3 files for each table in each db. A .frm file, .myd file, .myi file. All three are contributing to the space used
  • FRM files contain table structure information.
  • MYD contains the actual table data.
  • MYI files contain the indexes.
Using “sudo ls -hs path” list the files inside the folder given by the path with sizes.



-s, --summarize (display only a total for each argument)
-h, --human-readable ( print sizes in human readable format (e.g., 1K 234M 2G))



so It showed that figures given in the information schema is similar to the results given by the above command. But size of the FRM file was not take in to account by me. But looks like all the .frm files are having the same length (12K) so I can sum them up If I know the number of tables. However I have to check why others ignore this file when they are calculating.

Some databases only have .frm file

Going through the database folders, I saw that some folders only have a .frm file in it. Searching for that I found out that there is two major engines within MySQL: MyISAM and InnoDB. If the table belongs to InnoDB, it only includes .frm file. Data of that kind of DBs are stored in a single or multiple (you can configure that) .idb files.

Prototype version 1 has to be verified.


This prototype is tested on my own machine with several MySQL users created by me. So this is not guaranteed to work in the real scenario. This should work with the actual billing architecture, to be any use for my work. This is the second and biggest verification that I have to do.

Before that there is a simple and basic verification that I have to consider. In prototype code I calculate a table size as follows (here information from information_schema.TABLES is used)

Table Size = Data Length + Index Length

Both Data Length and Index Length can be found in information_schema.TABLES. I have to verify that figure given by the above calculation is the correct table size.

Prototype version 1


As I was asked to build a proof of concept(POC), as I mentioned in before posts, I started working on it. By now I have a working prototype that logs the DB size information. Here I will create two tables one contain user details like user name, user DB size limit and binary information column 'exceeded' that will indicate whether subject user has exceeded his limits (this column is not used feather in my currant implementation, but I thought it would be helpful to have such a column in future).

Second table is a logging table, this logs the disk space used by each user, time to time. This again has 3 columns. A time stamp, user name and exceeded number of bytes (used number of bytes-user limit in bytes) are the information that is included in that table.

Below is the simple code that do that job
include code here.

import java.sql.*;

public class sizeCheck {

 public static void main(String args[]) {
  Connection con = null;
  Statement stat, stat2;
  boolean debug = true;
  try {
   Class.forName("com.mysql.jdbc.Driver").newInstance();
   con = DriverManager.getConnection("jdbc:mysql://localhost/", "root", "root");

   if (debug && !con.isClosed())
    System.out.println("Successfully connected to " + "MySQL server using TCP/IP...");

   stat = con.createStatement();
   stat2 = con.createStatement();

   // query to select all of the data from a table
   String selectQuery = "SELECT user,sizelimit FROM `Quota`.`Quota`";
   // get the results
   ResultSet results = stat.executeQuery(selectQuery);

   // output the results
   while (results.next()) {
    String getDBsizes =
                        "SELECT SUM(DATA_LENGTH) as sumData, SUM(INDEX_LENGTH) as sumIndex FROM `information_schema`.`TABLES` WHERE TABLE_SCHEMA LIKE '" +
                                results.getString("user") + "%'";
    ResultSet sizeResult = stat2.executeQuery(getDBsizes);
    sizeResult.next();
    if (debug) {
     System.out.println(sizeResult.getInt("sumData"));
     System.out.println(sizeResult.getInt("sumIndex"));
     System.out.println(sizeResult.getInt("sumData") + sizeResult.getInt("sumIndex"));
    }
    if (sizeResult.getInt("sumData") + sizeResult.getInt("sumIndex") > results.getInt("sizelimit")) {
     int exccededGB =
                      sizeResult.getInt("sumData") + sizeResult.getInt("sumIndex") -
                              results.getInt("sizelimit");
     String setExceeded =
                          "UPDATE `Quota`.`Quota` SET `exceeded` = '1' WHERE `Quota`.`user` = '" +
                                  results.getString("user") + "';";
     stat2.executeUpdate(setExceeded);
     String logExceeded =
                          "INSERT INTO `Quota`.`exceed` (`datetime` ,`user` ,`exceedbytes` ,`other`) VALUES (CURRENT_TIMESTAMP , '" +
                                  results.getString("user") +
                                  "', '" +
                                  exccededGB +
                                  "', '');";
     stat2.execute(logExceeded);
     // String revoke =
     // "REVOKE INSERT, UPDATE, CREATE, ALTER ON `"+
     // results.getString("user") +"\\_%` . * FROM '"+
     // results.getString("user") +"'@'localhost';";
     // if(debug)
     // System.out.println(revoke);
     // stat2.execute(revoke);
    } else {

     String setExceeded =
                          "UPDATE `Quota`.`Quota` SET `exceeded` = '0' WHERE `Quota`.`user` = '" +
                                  results.getString("user") + "';";
     stat2.executeUpdate(setExceeded);
     // String grant = "GRANT ALL PRIVILEGES ON `"+
     // results.getString("user") +"\\_%` . * TO '"+
     // results.getString("user") +"'@'localhost';";
     // if(debug)
     // System.out.println(grant);
     // stat2.execute(grant);
    }
   }

  } catch (Exception e) {
   System.err.println("Exception: " + e.getMessage());
  } finally {
   try {
    if (con != null)
     con.close();
   } catch (SQLException e) {
   }
  }

 }

}
you can download it form below


Tuesday, June 12, 2012

First Salary Party @ DineMore

Even though it was not a big salary, we didn't forget to celebrate what we got. As lot of us thought that is no good to ride a long distance for this, we thought of having it somewhere near. Closest was DineMore. I really didn't had any preferences about where we should go, what we should eat, etc. I only need to have it somewhere close.

we @ wso2

Monday, June 11, 2012

Do you need data to play with?


If you are doing any data related work, you need some big data to work with. As a example I am working with mySQL, and I want to try a big query that will populate my 'SLOW QUERY LOG'. To get in to the slow query log that query must run for some time (although we can define that time, my tables were to small to produce at least a 1ms). I needed some big tables that will populate my databases with some big data.

Thats where I found the following .sql file that have lot of data in it. This is some kind of a sampple given by mySQL. Anyway here it is.

https://docs.google.com/a/wso2.com/file/d/0B4VQdLMBav1WTXQ3cnY5RHdWWkE/edit
go to it and save it..

Who thought that installing MySQL will that that long?


I wanted to see that list of users, so as most of you know it is done by ,
select * from mysql.user;

But this was very tedious, so I thought I should go to a GUI based server like PHPmyadmin. SO I thought of installing WAMP like thing here. As I am working in ubuntu there is no WAMP, I have to work with XAMPP, which is very similar.

It was a easy job with installing it. Just go to http://www.apachefriends.org/en/xampp-linux.html and follow there 4 steps and you will be done.


OR
use below

Then When I tried to start the servers it gave me the following massege
XAMPP is currently only availably as 32 bit application. Please use a 32 bit compatibility library for your system.

To fix that problem I had to install 32 bit libraries (which took several minutes to download [<70MB]) to my computer. I got help from following site (http://preprocess.me/comment/386)

you have to run the below command to install 23 bit libs
sudo apt-get install ia32-libs


But when I started PHPmyadmin it gave me this error.
#2002 - The server is not responding (or the local MySQL server's socket is not correctly configured)

I always knew that something like this will popup as I had a already installed mySQL server in my computer.

So I thought of removing that existing mySQL sever. This page helped me in that process.
A little problem came with the commands. When copying and pasting it to he bash. So Here are the corrected ones.
apt-get --purge remove mysql-server
apt-get --purge remove mysql-client
apt-get --purge remove mysql-common
apt-get autoremove
apt-get autoclean
//befor purge it should be -- not –

Now restart the server and Everything will work fine.

Thursday, June 7, 2012

Followed the BAM samples.


Continued what I did in the last day. Downloaded the BAM2 alpha 2 binary distribution and documents form the BAM home page (http://wso2.com/products/business-activity-monitor). There I followed the example project given in the samples (KPI Sample Guide: wso2bam-2.0.0-ALPHA2-docs/kpi-sample.html). It was so interesting to see that raw data is been visualized in various types of charts. There you can find a agent that will pump events to the BAM server.

Problems that I encounter:

How to run that agent(jar) that we get after building with ant: We don't have to run it, ant script will automatically run it after building

Some steps are impossible to complete: I had the alpha1 version with me and that was not syncing with the tutorial. I had to download the alpha 2.

Wednesday, June 6, 2012

Collecting and summarizing the captured data


As I have came to some kind of a understanding of how to capture data and what data to capture, I was starting to think about, how we can manage and summarize those data. For that reason I started searching about “WSO2 Business Activity Monitor”. This tool has been used in summarizing the bandwidth data currently collected in the AS. Find more information about BAM 2 form following link (http://wso2.com/products/business-activity-monitor/)

If you are interested in BAM, you can watch a webinar(http://www.youtube.com/watch?

[Image: from wso2 bam2 docs]
feature=player_embedded&v=toIeQNG_Y8E) or you tube or you can follow the documentation found in the following link(http://wso2.org/project/bam/1.3.2/docs/samples.html).

[Image: from wso2 bam2 docs]

Business-activity-monitoring

This is a 3 step process,
  1. Capturing
  2. Analysis
  3. Presenting

When working with BAM only you have to worry about capturing and sending those data to BAM server. When going through BAM you may find these two words Hadoop, Cassandra. In BAM data storage is based on Cassandra where as analyzer framework is based on Hadoop. If you don't know what they are read the below definitions.

[Image: from wso2 bam2 docs]

Hadoop: Hadoop is open source software that enables distributed parallel processing of huge amounts of data across inexpensive, commodity servers. (http://www.cloudera.com/what-is-hadoop/)

Cassandra: Apache Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. (http://en.wikipedia.org/wiki/Apache_Cassandra)

Tuesday, June 5, 2012

Coding Styles and Eclipse


When we are working in a group it is always important to keep a fix standard or a way of coding through the whole group. This increase the readability and maintainability of the code. In WSO2 we are given a set of coding standards. In order to automate this process, you can configure your IDE to perform some formatting related tasks automatically. In Eclipse, we use Code Templates to configure IDE.

Code Cleanup Template


What is 'Code Cleaning up' ? This refers to the way that eclipse cleanup your code when cleanup command is given. It remove all the unnecessary parts and unstandardized parts according to the configuration given. We are given this configuration through a XML file (https://sites.google.com/a/wso2.com/engineering/Home/eclipsecodeformattingtemplatesforwso2developers/WSO2_Eclipse_Code_Cleanup.xml). This sets below configuration.

  • Change non static accesses to static members using declaring type
  • Change indirect accesses to static members to direct accesses (accesses through subtypes)
  • Convert control statement bodies to block
  • Convert for loops to enhanced for loops
  • Remove unnecessary parentheses
  • Add missing '@Override' annotations
  • Add missing '@Override' annotations to implementations of interface methods
  • Add missing '@Deprecated' annotations
  • Remove unnecessary casts
  • Remove unnecessary '$NON-NLS$' tags
  • Add unimplemented methods
  • Remove trailing white spaces on all lines
  • Correct indentation
If you want you can have your own profile, and even you can export it. You can find how to do it in the following article http://www.ibm.com/developerworks/library/os-eclipse-clean/

Code Formatting Template


This template takes care of, how formatting is happen in the code level. Indentation, Braces, white spaces, comments, etc. are formatted according to the given configuration. Again this formatting template is also given to us in WSO2 as a template (https://sites.google.com/a/wso2.com/engineering/Home/eclipsecodeformattingtemplatesforwso2developers/WSO2_Eclipse_Code_Formatter.xml).

In Order to apply these templates, follow the steps given below.

  1. Window -> Preferences
  2. Java-> Code Style
  3. Select Clean up option in the preference page and point to the WSO2_Eclipse_Code_Cleanup.xml.
  4. Select Formatter option in the preference page and point to the WSO2_Eclipse_Code_Formatter.xml
In order to make our life easy we can use eclipse shortcuts : http://www.a2ztechguide.com/2011/08/eclipse-ide-keyboard-shortcuts-for.html

Saturday, June 2, 2012

Suggestions and replies


With my suggestion, others started to give their feedback. There were few interesting and informative feedbacks as listed below.

Sanjeewa Malalgoda: I was looking into this subject some times back and found some points. AFAIU only number of transactions is not enough .found some interesting tools like dbtuna[1] and jetprofiler[2] I hope we can have a look at them and get some idea. I have tested jetprofiler and it gives
lot of information related the db usage.



I went through them and got to this below conclusion which I made as my reply.

“It is always good to know a person who has worked in the same area. I went through those 2 tools and they mainly target the management and administrative aspects of the db server. It gives us nice graphical representations about existing data. This can be very useful when understanding the usage patterns of users. But this do not give us any new informations, this only presents the date found in the information_schema, logs. As you have used it you might know more about it, so correct me if I am wrong.”

Jet Profiler

  1. Install Java 1.6 separately
  2. Unzip jetprofiler_v2.0.5.zip to the desired folder (e.g. /usr/local/bin/jetprofiler or /home/USER/bin/jetprofiler).
  3. Run ./jetprofiler

I continued


So my last day suggestion way not up to the standard, so Amilam asked me to restructure it, so this is the version 2

Problem:

Until now we don't have a way to measure the usage of the StratosLive RSS. There can be many views of the usage like I/O per unit time, accumulated DB size per user, Bandwidth used by a user (i think this is taken in to account by now). We need to start measuring(tracking) the usage at least under one view (ideally in all). Real problem comes with the limitations of mySQL. In mySQL, it do not have in built support for above mention things.

we cannot limit DB space: http://www.howtoforge.com/forums/showthread.php?t=1944
we cannot limit/measure bandwidth/IO rate of a individual database/user : http://forums.mysql.com/read.php?10,543847,543847
There is a way to set the MAX_QUERIES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, but there is no way to get the current state of those variables (without messing with mySQL code base, http://forums.mysql.com/read.php?35,179219)

Solutions:

There are external suggestions for limiting database size, we can do something like that, as a example,
MySQL Quota Daemon: http://lrem.net/software/mysql-quota-daemon.xhtml
MySQL Quota-Tool: http://projects.marsching.org/mysql_quota/

And we can limit the user to a MAX_QUERIES_PER_HOUR using in built support in mySQL (http://dev.mysql.com/doc/refman/5.5/en/grant.html). This limit will not be a problem to the user while service is protected from getting extreme number of request per unit time (kind of DOS)

And there are suggested ways to limit the DB size per user, using cronjobs that will notify the user of excess usage of space.

How we can use what we have:

1. We can limit the size of the DB at the time of DB creation based on the usage plan.
2. We can define transactions per hour for all or selected usage plans

Please reply with your feedback on this view, it will be very helpful...

Limiting The Resource Use


Resuming from where I stopped yesterday, I started by looking how to limit the resource use.

Current state of the MAX_QUERIES_PER_HOUR, MAX_UPDATES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, MAX_USER_CONNECTIONS variables.


Yes, I can limit it, but the ultimate goal of this project is to measure the resource use. So it is always better to have something that can do my prime goal. So I needed to know whether I can get any information about the current state of MAX_QUERIES_PER_HOUR, MAX_UPDATES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, MAX_USER_CONNECTIONS variables.

I checked in MYSQL metadata schema(information_schema, performance_schema)

If mySQL can limit it, it should have a count somewhere right?. That was how I thought, so I started looking through some default schema like information_schema, performance_schema, etc. Sorry to say that I found nothing. But still I think it should save this data somewhere.


Limiting the database size (MySQL Quota Daemon)




I went though that library I found last day (MySQL Quota Daemon : http://lrem.net/software/mysql-quota-daemon.xhtml) and it is a good tool (it requires perl). Another very small piece of code for doing the same thing was MySQL Quota-Tool (http://projects.marsching.org/mysql_quota/)

Need to find I/O rates, bandwidth used by each Database in the MySQL server.


So where I went yesterday seems to be misleading. I thought I have to measure the bandwidth used by the requests coming into the WSO2 Data Service Server. However what I have to do was, I have to measure the facets about queries happen in between web service end and the database end. First I thought If I calculate(simply sum) the length fields of the SOAP requests it would be enough. I got to know that they(WSO2) already measuring that. What I have to do is, when SOAP is read and DB queries are made I have to measure those connections.

MySQL log files


I thought there can be interesting information in MySQL log files so I searched on that. Search gave me this resource( http://dev.mysql.com/doc/refman/5.6/en/server-logs.html ), where it talks about different types of server logs like general log, error log, slow query log. After all I found them no interest to my problem.

Server Status Variables

My next stop was server status variables (http://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html). This gives us lot of valuable information about the server and those are very related to measuring server usage. Nevertheless our main problem of finding the usage by a individual database is still no solved.

Limiting the number of connections, size, number of queries


Being exhausted with finding a way to measure size and I/O rates, I stared searching of a way to limit those parameters. Lucky me, I have found something, even it is not what I wanted. So I am thinking about a billing system where we will not measure but we will limit the usage of the user. User can select a plan that suits him. As a example he can select a plan giving 20BG of space + 1M queries per hour.

How to limit other parameters like no_of_connections, no_of_queries_per_hour, etc.:http://dev.mysql.com/doc/refman/5.5/en/grant.html

WSO2 Data Services Server User Guide


With the specified job of figuring out a way to do billing on 'WSO2 Data Services Server' I started understanding the 'WSO2 Data Services Server' I was mid away through with the 'WSO2 Data Services Server' user guide at the start of 28th of may 2012(today). As I was not expected to fully understand its features I went through then very quickly. As some of those features are very interesting I stopped for a while to try it on my newly installed server.

After going through that user guide for 'WSO2 Data Services Server' I got a fare understanding of the features and capabilities of 'WSO2 Data Services Server'. My next step was to look at any competitors out there in the market giving the same functionality. What I found was there are no competitors who are giving the exact or exceeding functionalities. But Amazon Relational Database Service, and HP Relational Database Service seemed close enough to compare. From those two Amazon one was more closer (still it is not complete as 'WSO2 Data Services Server').

My Job

My Job is to design and build a billing system for the WSO2 data service server.




WSO2 Data Services Server:


Behind most application silos are heterogeneous and disparate data stores.The WSO2 Data Services Server augments Service Oriented Architecture (SOA) development efforts by providing an easy to use platform for integrating data stores, creating composite data views, and hosting data services.
Data services provide unprecedented data access and straightforward integration with business processes, mashups, gadgets, business intelligence and mobile applications. The WSO2 Data Services Server supports secure and managed data access across federated data stores, data service transactions, and data transformation and validation using a lightweight, developer friendly, agile development approach.
Our lean software development process creates an important customer benefit; our cost. WSO2 Data Services Server offers significant time saving and affordable acquisition. Purpose-built for rapid configuration and efficient extension, users agree the product is easy to configure and extend. These attributes lead to lower investment and higher ROI.

From: http://wso2.com/products/data-services-server

Stating the WSO2 internship


In the first week we are asked to get familiar with the technologies that we will find in the future. These technologies are relevant to anyone who is in this filed (don't have to work in WSO2). Below I have provided what I went through the first week with the references that they use.(Thank goes to Selvaratnam Uthaiyashankar for providing these materials for us)

XML
=====
1. What is XML? -
* http://www.ibm.com/developerworks/edu/x-dw-xmlintro-i.html?S_TACT=105AGX06&S_CMP=HP
(Introduction to XML developerWorks, Doug Tidwell
(dtidwell@us.ibm.com), XML Evangelist, IBM ). It covers pretty much
everything. Do not worry about DTD, just XML Schema would do.
* http://www.ibm.com/developerworks/xml/newto/
* http://www.xml.com/lpt/a/316
2. XML Parsing, DOM, SAX and Pull - Learn about DOM, and SAX, pull
parsing you can learn later.
* SAX/DOM -http://www.ibm.com/developerworks/xml/library/x-jaxp/
* SAX - http://www.ibm.com/developerworks/xml/library/x-saxapi/
* Validation -
http://www.ibm.com/developerworks/xml/library/x-jaxpval.html?S_TACT=105AGX06&S_CMP=EDU
* StAX'ing up XML, Part 1: An introduction to Streaming API
for XML (StAX) -
http://www.ibm.com/developerworks/xml/library/x-stax1.html - there are
three articles in the StAX'ing up XML, read them later. We do not need
Stax initially for our work.
3. XML Schema - http://www.w3schools.com/Schema/default.asp
4. XML Beans -
http://xmlbeans.apache.org/documentation/tutorial_getstarted.html


Web Services
============
Learn about Axis2 as much as possible. Follow the
Axis2 user guide A to Z,
(1) http://axis.apache.org/axis2/java/core/docs/userguide.html
(2) Apache Axis2 Web Services, 2nd Edition

1. Download and Install Axis2, read user/ developer Guide
2. Invoke a default service. Basically learn all the options, both
POJO and code generate a client and use the client
3. Write a new service and deploy it, and invoke it using your client.

First day, More fun. Less work.


This day(15-05-2012) was the first day in WSO2 as a trainee. We were divided in to groups and given basic introduction to everything. More of the time we were playing tt, carom, foosball.


WSO2


Vision:
Founded in August 2005, WSO2 is a global enterprise middleware corporation with offices in USA, UK and Sri Lanka. Providing the only complete open source middleware platform, WSO2 is revolutionizing the industry by putting traditional middleware on a diet and introducing lean, powerful and flexible solutions to address the 21st century enterprise challenges.

Products:
With its revolutionary component-based design, WSO2 middleware adapts to the project for a lean, targeted solution to enterprise applications.
Fully cloud-native, the WSO2 middleware platform is also the only open source platform-as-a-service for private, public and hybrid cloud deployments available today. With WSO2, seamless migration and integration between servers, private clouds, and public clouds are now a reality.



Open Source and Standards:
All WSO2 products are 100% open source and released under the Apache License Version 2.0. Contributing to key international standards organizations and foundations such as W3C, OASIS, OpenID Foundation, Infocard Foundation, Microsoft’s Interop Vendor Alliance, AMQP Working Group, oCERT, and Cloud Security Alliance, we possess the critical skills and experience to competently manage your problems in an open and transparent environment.



Who We Are
Team WSO2 consists of experts in their respective domains, who have helped define numerous Web service standards and norms which are widely used today such as XML, SOAP, Apache Axis/ Axis 2, Apache Synapse, WSDL and much more. WSO2 has also been involved in welfare projects such as the Sahana project, which is now the world’s leading disaster management system.



Our Culture
We are motivated by passion for work, driven by a free, open and collaborative company culture that centers on client satisfaction and integrity. The glory and success of WSO2 invariably lie on the fact that it has been nourished by the best-of-breed experts from its humble beginnings to this day.



From: http://wso2.com/about