iDatamining.org

I am looking for projects to work on
Please contact with me at yiyu.jia@iDataMining.org!

Tuesday, September 28, 2010

static variable in PHP vs static variable in Java

PHP is evolving to be a Object Oriented language. As a java programmer and PHP programmer, I like to compare PHP with Java. In this post, I am going to write down my comparison about "static" modifier in PHP and Java.

PHP and Java as they have different architecture for running time environment. I feel it is necessary point out the difference between PHP static and Java static. We can investigate this from two different aspects at least: static variable scope and static variable access.

Firstly, let's see the difference about static variable scope. For Java, a static variable will survive throughout the whole life when JVM is running or when class is unloaded using some techniques. That means, once the static variable is used, it will exist in memory as long as the java application is running. And, there is only one copy of static variable in memory (in one Java class loader's scope in a JVM process). For PHP, the thing is different as PHP has no memory. That means all PHP code will be flushed out after php scripts (including all included and required php script files). So, the variables, which is even declared as "static" or "Global", will be destroyed . A PHP variable can not survive through two different script executions. Other thing we need to pay attention to is that programmer is not supposed to assign a reference to a static variable.

Secondly, let's see some difference about how to access static variables in PHP and Java.  Both Java and PHP has concepts about "class"  and "instance of class" now. A "className" is used as class in code. A "new className()" is used as instance (object) of class in code. But, I noticed a difference of calling static member in class between PHP and Java. In Java, a static variable can be referred through a instance of class though it is not encouraged. In PHP, it is not allowed to do so according to my test. Below are code snippet,

 Java code
/**
 *
 * @author Yiyu Jia
 */
public class Main {

    public static String dummy = "dummy";

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        // TODO code application logic here
        Main foo = new Main();
        //I can access the static variable through class instance
        //although this is not encouraged.
        System.out.println(foo.dummy); 
    }

}

 PHP code
class DummyClass
{
    static $jia = 'yiyu';

    function testStatic()
    {
     print(self::$jia);
     echo $this->jia; //error. Not allowed in PHP.
    }
}

Java and PHP has totally different run time environment architecture. I am going to find a good way to present the difference later.

For deeper understand about PHP static modifier, below URLs are very helpful,

1)Static keyword: http://php.net/manual/en/language.oop5.static.php


2)Variable scope: http://php.net/manual/en/language.variables.scope.php


3)Something important after PHP became a OOP: http://www.php.net/manual/en/language.references.return.php

Wednesday, September 22, 2010

simulating Cube function by using Group By in MySQL

Some commercial RDBMS like DB2 has function "cube". However, MySQL (5.1.48 on my machine) does not support cube yet. It supports rollup function. but I need cube function. So, I use a group of "group by" to simulate the cube function. It might not be the best solution. But, it should be much better than making thousands queries to calculate sum values. There are two steps in this simulating,

1) listing out all possible combination of column names. The total number of group by strings will be 2^n where n = the # of columns (dimension number).

2) run each sql select query with group by clause to get sum values, which we are interested.

Monday, September 6, 2010

Convert DBLP(XML format) to a relational DB (MySQL)

I need a data set on which my new algorithm can be tested. I am posting here the small work of converting DBLP (in XML format) into MySQL.

Data source, http://www.informatik.uni-trier.de/~ley/db/ . I downloaded its XML format file, which contain information about thousands of articles.

Target data storage, A simple database designed for MySQL. Below is the SQLs for creating table.

CREATE TABLE `DBLP`.`articles` (
`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT  PRIMARY KEY ,
`title` VARCHAR( 250 ) NOT NULL ,
`bookTitle` VARCHAR( 150 )  NOT NULL ,
`pages` CHAR( 13 ) ,
`year` CHAR( 4 ) NOT NULL ,
`mdate`  CHAR( 10 ) NOT NULL ,
UNIQUE (
`title`
)
) ENGINE = MYISAM  ;



CREATE TABLE `DBLP`.`authors` (
`id` BIGINT UNSIGNED NOT NULL  AUTO_INCREMENT PRIMARY KEY ,
`name` VARCHAR( 50 ) NOT NULL
UNIQUE (
`name`
)
) ENGINE =  MYISAM ;

CREATE TABLE `DBLP`.`linkTbl` (
`article_id` BIGINT  UNSIGNED NOT NULL ,
`author_id` BIGINT UNSIGNED NOT NULL ,
`position`  TINYINT UNSIGNED NOT NULL ,
INDEX ( `article_id` , `author_id` )
) ENGINE  = MYISAM ;
Then, I made a java program to access the XML file and insert each article info into new created RDBMS tables.

Java code is using xerces SAX parser to read and parse the XML file. We use SAX parser not DOM parser because the XML has very large file size. It is over 700M on the disc.

Java code is small. But it is too long to be post here. So, I am only put important SQL query in Java code here.

String articleInsertSql = "INSERT INTO `articles` (`id` ,`title` ,`bookTitle` ,`pages` ,`year` ,`mdate`)"
+ " VALUES (NULL , ?, ?, ?, ?, ?)"
+ " ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)";

String articles_lastId = "set @articleId=LAST_INSERT_ID()";

String  authorReplaceSql = "INSERT INTO `authors` (`id` ,`name`)"
+ " VALUES (NULL , ?)"
+ " ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)";

String authors_lastId = "set @authorId=LAST_INSERT_ID()";

String lookTableInsertSql = "INSERT INTO `linkTbl` (`article_id` ,`author_id` ,`position`)"
+ " VALUES (@articleId, @authorId, ?)";


Here, string articleInsertSql , articles_lastId, authorReplaceSql, authors_lastId, lookTableInsertSql are executed in PreparedStatement sequentially. Here, " ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id)" can be highlighted because it let following "set @articleId=LAST_INSERT_ID()" or "set @authorId=LAST_INSERT_ID()" to assign right ID values to @articleId or @authorId .