Internship at WSO2: WSO2 Data Services Server

Showing posts with label WSO2 Data Services Server. Show all posts

Friday, November 30, 2012

Measuring/Billing database usage in StratosLive - Summery

This Article collects all the posts under the Measuring/Billing database usage in StratosLive.

My Job
WSO2 Data Services Server User Guide
Need to find I/O rates, bandwidth used by each Database user
Limiting The Resource Use
I continued
Suggestions and replies
Collecting and summarizing the captured data
Followed the BAM samples.
Do you need data to play with?
Prototype version 1
Prototype version 1 has to be verified.
1st Verification
OSGi Services
Publishing to BAM
Using OSGi console to debug things

[Break for Test Automation]

Back to the Frozen project
WSO2 Storage Server
The Inevitable Change
Strange things do happen
Using Hive Scripts to Analyze and Summarize BAM data
Difference between two time ignoring the date
Replacing for ntask(quartz-scheduler), using timer task
It is almost 'THE END'

Wednesday, September 26, 2012

It is almost 'THE END'

Now this is the summery of What I have done

It is agreed to measure the database space usage by each tenant. Here we will not limit the tenant(in terms of database access) on its DB usage but will keep track on excess DB space use by each tenant.

Component level view of the process.

Changes to each component:

Rss-manager: This component will be used to gather usage data from the RSS. And this will add those data to a queue which in turn will be retrieved by usage agent component. This Usage data collection will be handle through couple of newly added classes. And this is scheduled to be run daily. And it is configurable to run starting from a given time and repeated with given time gap(currently decided to run it in 24h intervals). Here we will only interested in tenants with exceeded usage. So it is needed to know the usage plan of a interested tenant, in order to get its limits. We thought of only publishing information about those tenants who exceeds the space limits, due to two reasons.

To reduce the data transfer between components and to the BAM server.
Exceeded DB size is all we need for billing calculations.

Usage-agent: This component will retrieve usage data from the queue(above mentioned) in the rss-manager. This is handled by newly added class, DatabaseUsageDataRetrievalTask. This is also scheduled to be run daily. And it is configurable to run starting from a given time and repeated with given time gap(currently decided to run it in 24h intervals).

Stratos-commons: This is where usage plan details are manipulated. Here plan details are read from 'multitenancy-packages.xml' and made available for use through a service. Here I have changed the xml file, xml reading class, data storing bean, to contain DB usage related data.

Dependencies: this depends on the yet to develop component (to get the tenant usage plan given the tenant domain/id) and that component is required for the RSS-Manager component changed to work perfectly.

Sunday, September 9, 2012

WSO2 Storage Server

This is a new product by WSO2. This contain some of the data related functionalists the was in WSO2 DSS (Data Service Server). These removed functionalists do not go with the word "data services" but they were useful for data services. So I think It is a good Idea to have those as a separate product.

This new product (WSO2 Storage Server) will include(most probably) following as basics,

RSS (Relational Storage Service/Server)

CSS (Column Storage Service/Server)

HDFSSS (HDFS Storage Service/Server)

This is not there to be downloaded yet. But will be soon. I only know few details about the storage server. When I get to know more, I'll add those details here as well.

Thursday, September 6, 2012

Back to the Frozen project

With the end of the automation hackathon we are back on our original projects which were frozen over a month. I have forgotten almost everything related to my project. Luckily I have my own blog to read :D. Several thing to do before starting it all over again. First I need to move in to the trunk again (as this will not be there in the next release). I have to take a svn up in the trunk and build it (as we were working in the branch for the last month).

I got a svn up and started building it. Soon as I enter the component in platform I got a error and looks like it is not fixed yet and the fix seems to be long. At the same time I got the feeling that this specific component might not be wanted for the product I am working in(DSS). As I don't have a clear and complete idea about the components that are used in the DSS, hoping that is not used in the DSS and to avoid unwanted problems, I started the build with the option -P product-<product name> in this case it is mvn install -P product-dss. Soon I got to know that component is there in dss too :(. Anyway it is good to know that maven command to build only one product.

Building only one product

This is something new I learned when I was in the automation hackathon. There we worked in a product called greg and didn't want to build anything other than what is needed to continue our work with greg testing. Building the whole thing is too expensive. So for anyone who needs to build only the components, stubs, etc. that is need to a one product, can use the below method.

go to the platform directory
enter command for building(that you normally use) with following added to the end -P product-<product name>, replacing the <product name> with the product name that you are targeting.

Tuesday, July 10, 2012

Using OSGi console to debug things

This OSGi console was very helpful to me in finding out problems relevant to OSGi bundles. You can have lot of information about this in the Internet. In here what I am going to write is about how I used it in the debugging.

First start the server with following option

-DosgiConsole. This will start the server with OSGi console. This is very helpful in situations like, checking whether all bundles that you put in to droppings have activated. When the server starts. Put ss in console. This will give a list of all bundles.

Use it with 'like' modifier to get the relevant bundles only. Ex ss like student

Now you can see state of all bundles. With the installation of a bundle in the OSGi runtime this bundle is persisted in a local bundle cache. The OSGi runtime then tries to resolve all dependencies of the bundle.If all required dependencies are resolved the bundle is in the status RESOLVED otherwise it is in the status INSTALLED.

If several bundles exist which would satisfy the dependency, then the bundle with the highest version is used. If the versions are the same, then the bundle with the lowest ID will be used. If the bundle is started, its status is STARTING. Afterwards it gets the ACTIVE status. - (http://www.vogella.com/articles/OSGi/article.html#osgiarch_bundles good reference)

You have to have all your bundles acticvated. If not check what is wrong with them using 'bundle <bundle number>' command. If there is no error, try starting it manually using 'start <bundle number>'

Monday, July 9, 2012

Publishing to BAM

I created this just to publish data I collect to BAM, so this do not adhere to good programming practices.

package org.wso2.carbon.usage.agent.util;

import java.net.MalformedURLException;
import java.net.SocketException;
import java.net.UnknownHostException;

import javax.security.sasl.AuthenticationException;

import org.wso2.carbon.eventbridge.agent.thrift.Agent;
import org.wso2.carbon.eventbridge.agent.thrift.DataPublisher;
import org.wso2.carbon.eventbridge.agent.thrift.conf.AgentConfiguration;
import org.wso2.carbon.eventbridge.agent.thrift.exception.AgentException;
import org.wso2.carbon.eventbridge.commons.Event;
import org.wso2.carbon.eventbridge.commons.exception.DifferentStreamDefinitionAlreadyDefinedException;
import org.wso2.carbon.eventbridge.commons.exception.MalformedStreamDefinitionException;
import org.wso2.carbon.eventbridge.commons.exception.NoStreamDefinitionExistException;
import org.wso2.carbon.eventbridge.commons.exception.StreamDefinitionException;
import org.wso2.carbon.eventbridge.commons.exception.TransportException;

public class PublishUtil2 {
    public static final String STREAM_NAME1 = "org.wso2.db6.kpiii";
    public static final String VERSION1 = "1.0.6";
    private static String streamId1;
    private static DataPublisher dataPublisher = null;
    
    

    public static void publish(long exceededBytes, long databasesize, String tenentdomain) throws AgentException, MalformedStreamDefinitionException,
    StreamDefinitionException, DifferentStreamDefinitionAlreadyDefinedException, MalformedURLException,
    AuthenticationException, NoStreamDefinitionExistException,
    org.wso2.carbon.eventbridge.commons.exception.AuthenticationException, TransportException, SocketException, UnknownHostException{
        
     System.out.println("Starting BAM KPI Agent");
        AgentConfiguration agentConfiguration = new AgentConfiguration();
        String currentDir = System.getProperty("user.dir");
        System.setProperty("javax.net.ssl.trustStore", currentDir + "/repository/resources/security/client-truststore.jks");
        System.setProperty("javax.net.ssl.trustStorePassword", "wso2carbon");
        Agent agent = new Agent(agentConfiguration);
        
     dataPublisher = null;
        try {
        dataPublisher = new DataPublisher("tcp://10.100.3.80:7613", "admin", "admin", agent);
        } catch (Throwable e){
         e.printStackTrace();
        }

        streamId1 = null;


        try {
            streamId1 = dataPublisher.findEventStream(STREAM_NAME1, VERSION1);
            System.out.println("Stream already defined");

        } catch (NoStreamDefinitionExistException e) {
            streamId1 = dataPublisher.defineEventStream("{" +
                    "  'name':'" + STREAM_NAME1 + "'," +
                    "  'version':'" + VERSION1 + "'," +
                    "  'nickName': 'DSSUsage'," +
                    "  'description': 'Exceeded DB Use'," +
                    "  'metaData':[" +
                    "          {'name':'clientType','type':'STRING'}" +
                    "  ]," +
                    "  'payloadData':[" +
                    "          {'name':'exceededBytes','type':'LONG'}," +
                    "          {'name':'databasesize','type':'LONG'}," +
                    "          {'name':'tenentdomain','type':'STRING'}" +
                    "  ]" +
                    "}");
        }
        //Publish event for a valid stream
        if (!streamId1.isEmpty()) {
            System.out.println("Stream ID: " + streamId1);
            publishEvents(tenentdomain, exceededBytes, databasesize);

//            for (int i = 0; i < 1; i++) {
//                publishEvents("malinga");
//                System.out.println("Events published : " + (i + 1) * 2);
//            }
//            try {
//                Thread.sleep(3000);
//            } catch (InterruptedException e) {
//            }

            dataPublisher.stop();
        }
    }
    
    public static void publishEvents(String name, long exceededBytes, long databasesize) throws AgentException {
     System.out.println(name);
     publishEvents(dataPublisher, streamId1, name, exceededBytes, databasesize);

    }

    
    public static void publishEvents(DataPublisher dataPublisher, String streamId, String name, long exceededBytes, long databasesize) throws AgentException {
        Event eventOne = new Event(streamId, System.currentTimeMillis(), new Object[]{"external"}, null,
                new Object[]{exceededBytes, databasesize, 3600.0, name});
        dataPublisher.publish(eventOne);

    }

}

Problems I had to face.

I tried to change the streamId1, but it was not possible. It gave a error in BAM side. Then I got to know that schema is saved under the STREAM_NAME1, and if I want to change it, I have to do that with a change in STREAM_NAME1 too.

There was no way to check the the published data as it would show some rubbish, in the data viewer in BAM. I got a nice client for one of my mentors that can get the data from Cassandra cluster. This is written by 'Shariq Muhammed', he is a software engineer at WSO2. This is also written just to read the data and he haven't thought much about adhering to good programming practices. You can have it from below link

https://svn.wso2.org/repos/wso2/people/shariq/BAMStatClient/

First I couldn't sent long, It took me a while to figure that out. It was because the number I send was taken as a int. I had to add the 'L' to end of it to get it working.

Saturday, June 30, 2012

1st Verification – Verification of “index length + data length = actual disk space used”

I think I found where DBs are stored. As I am working in XAMPP there were stored in /opt/lampp/var/mysql there where folders for each database. When tried to open them it gives me a error saying “you don't have permission” Next thing I tried was trying to open it in the terminal using sudo. Unlucky me it gave me a error saying “sudo: cd: command not found”. Have to find a way to open such folders. I'll write a post on this, If I find a way (Or any one who know a way can help me by commenting below). Till then I used “sudo nautilus path/”

There were 3 files for each table in each db. A .frm file, .myd file, .myi file. All three are contributing to the space used

FRM files contain table structure information.

MYD contains the actual table data.
MYI files contain the indexes.

Using “sudo ls -hs path” list the files inside the folder given by the path with sizes.

-s, --summarize (display only a total for each argument)
-h, --human-readable ( print sizes in human readable format (e.g., 1K 234M 2G))

so It showed that figures given in the information schema is similar to the results given by the above command. But size of the FRM file was not take in to account by me. But looks like all the .frm files are having the same length (12K) so I can sum them up If I know the number of tables. However I have to check why others ignore this file when they are calculating.

Some databases only have .frm file

Going through the database folders, I saw that some folders only have a .frm file in it. Searching for that I found out that there is two major engines within MySQL: MyISAM and InnoDB. If the table belongs to InnoDB, it only includes .frm file. Data of that kind of DBs are stored in a single or multiple (you can configure that) .idb files.

Prototype version 1 has to be verified.

This prototype is tested on my own machine with several MySQL users created by me. So this is not guaranteed to work in the real scenario. This should work with the actual billing architecture, to be any use for my work. This is the second and biggest verification that I have to do.

Before that there is a simple and basic verification that I have to consider. In prototype code I calculate a table size as follows (here information from information_schema.TABLES is used)

Table Size = Data Length + Index Length

Both Data Length and Index Length can be found in information_schema.TABLES. I have to verify that figure given by the above calculation is the correct table size.

Saturday, June 2, 2012

Suggestions and replies

With my suggestion, others started to give their feedback. There were few interesting and informative feedbacks as listed below.

Sanjeewa Malalgoda: I was looking into this subject some times back and found some points. AFAIU only number of transactions is not enough .found some interesting tools like dbtuna[1] and jetprofiler[2] I hope we can have a look at them and get some idea. I have tested jetprofiler and it gives

lot of information related the db usage.

[1]http://www.dbtuna.com/

[2]http://www.jetprofiler.com/

I went through them and got to this below conclusion which I made as my reply.

“It is always good to know a person who has worked in the same area. I went through those 2 tools and they mainly target the management and administrative aspects of the db server. It gives us nice graphical representations about existing data. This can be very useful when understanding the usage patterns of users. But this do not give us any new informations, this only presents the date found in the information_schema, logs. As you have used it you might know more about it, so correct me if I am wrong.”

Jet Profiler

Install Java 1.6 separately
Unzip jetprofiler_v2.0.5.zip to the desired folder (e.g. /usr/local/bin/jetprofiler or /home/USER/bin/jetprofiler).
Run ./jetprofiler

I continued

So my last day suggestion way not up to the standard, so Amilam asked me to restructure it, so this is the version 2

Problem:

Until now we don't have a way to measure the usage of the StratosLive RSS. There can be many views of the usage like I/O per unit time, accumulated DB size per user, Bandwidth used by a user (i think this is taken in to account by now). We need to start measuring(tracking) the usage at least under one view (ideally in all). Real problem comes with the limitations of mySQL. In mySQL, it do not have in built support for above mention things.

we cannot limit DB space: http://www.howtoforge.com/forums/showthread.php?t=1944
we cannot limit/measure bandwidth/IO rate of a individual database/user : http://forums.mysql.com/read.php?10,543847,543847
There is a way to set the MAX_QUERIES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, but there is no way to get the current state of those variables (without messing with mySQL code base, http://forums.mysql.com/read.php?35,179219)

Solutions:

There are external suggestions for limiting database size, we can do something like that, as a example,
MySQL Quota Daemon: http://lrem.net/software/mysql-quota-daemon.xhtml
MySQL Quota-Tool: http://projects.marsching.org/mysql_quota/

And we can limit the user to a MAX_QUERIES_PER_HOUR using in built support in mySQL (http://dev.mysql.com/doc/refman/5.5/en/grant.html). This limit will not be a problem to the user while service is protected from getting extreme number of request per unit time (kind of DOS)

And there are suggested ways to limit the DB size per user, using cronjobs that will notify the user of excess usage of space.

How we can use what we have:

1. We can limit the size of the DB at the time of DB creation based on the usage plan.
2. We can define transactions per hour for all or selected usage plans

Please reply with your feedback on this view, it will be very helpful...

Limiting The Resource Use

Resuming from where I stopped yesterday, I started by looking how to limit the resource use.

Current state of the MAX_QUERIES_PER_HOUR, MAX_UPDATES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, MAX_USER_CONNECTIONS variables.

Yes, I can limit it, but the ultimate goal of this project is to measure the resource use. So it is always better to have something that can do my prime goal. So I needed to know whether I can get any information about the current state of MAX_QUERIES_PER_HOUR, MAX_UPDATES_PER_HOUR, MAX_CONNECTIONS_PER_HOUR, MAX_USER_CONNECTIONS variables.

I checked in MYSQL metadata schema(information_schema, performance_schema)

If mySQL can limit it, it should have a count somewhere right?. That was how I thought, so I started looking through some default schema like information_schema, performance_schema, etc. Sorry to say that I found nothing. But still I think it should save this data somewhere.

Limiting the database size (MySQL Quota Daemon)

I went though that library I found last day (MySQL Quota Daemon : http://lrem.net/software/mysql-quota-daemon.xhtml) and it is a good tool (it requires perl). Another very small piece of code for doing the same thing was MySQL Quota-Tool (http://projects.marsching.org/mysql_quota/)

Need to find I/O rates, bandwidth used by each Database in the MySQL server.

So where I went yesterday seems to be misleading. I thought I have to measure the bandwidth used by the requests coming into the WSO2 Data Service Server. However what I have to do was, I have to measure the facets about queries happen in between web service end and the database end. First I thought If I calculate(simply sum) the length fields of the SOAP requests it would be enough. I got to know that they(WSO2) already measuring that. What I have to do is, when SOAP is read and DB queries are made I have to measure those connections.

MySQL log files

I thought there can be interesting information in MySQL log files so I searched on that. Search gave me this resource( http://dev.mysql.com/doc/refman/5.6/en/server-logs.html ), where it talks about different types of server logs like general log, error log, slow query log. After all I found them no interest to my problem.

Server Status Variables

My next stop was server status variables (http://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html). This gives us lot of valuable information about the server and those are very related to measuring server usage. Nevertheless our main problem of finding the usage by a individual database is still no solved.

Limiting the number of connections, size, number of queries

Being exhausted with finding a way to measure size and I/O rates, I stared searching of a way to limit those parameters. Lucky me, I have found something, even it is not what I wanted. So I am thinking about a billing system where we will not measure but we will limit the usage of the user. User can select a plan that suits him. As a example he can select a plan giving 20BG of space + 1M queries per hour.

How to Limit Size: http://lrem.net/software/mysql-quota-daemon.xhtml

How to limit other parameters like no_of_connections, no_of_queries_per_hour, etc.:http://dev.mysql.com/doc/refman/5.5/en/grant.html

WSO2 Data Services Server User Guide

With the specified job of figuring out a way to do billing on 'WSO2 Data Services Server' I started understanding the 'WSO2 Data Services Server' I was mid away through with the 'WSO2 Data Services Server' user guide at the start of 28th of may 2012(today). As I was not expected to fully understand its features I went through then very quickly. As some of those features are very interesting I stopped for a while to try it on my newly installed server.

After going through that user guide for 'WSO2 Data Services Server' I got a fare understanding of the features and capabilities of 'WSO2 Data Services Server'. My next step was to look at any competitors out there in the market giving the same functionality. What I found was there are no competitors who are giving the exact or exceeding functionalities. But Amazon Relational Database Service, and HP Relational Database Service seemed close enough to compare. From those two Amazon one was more closer (still it is not complete as 'WSO2 Data Services Server').

My Job

My Job is to design and build a billing system for the WSO2 data service server.

WSO2 Data Services Server:

Behind most application silos are heterogeneous and disparate data stores.The WSO2 Data Services Server augments Service Oriented Architecture (SOA) development efforts by providing an easy to use platform for integrating data stores, creating composite data views, and hosting data services.
Data services provide unprecedented data access and straightforward integration with business processes, mashups, gadgets, business intelligence and mobile applications. The WSO2 Data Services Server supports secure and managed data access across federated data stores, data service transactions, and data transformation and validation using a lightweight, developer friendly, agile development approach.
Our lean software development process creates an important customer benefit; our cost. WSO2 Data Services Server offers significant time saving and affordable acquisition. Purpose-built for rapid configuration and efficient extension, users agree the product is easy to configure and extend. These attributes lead to lower investment and higher ROI.

From: http://wso2.com/products/data-services-server

Internship at WSO2