Friday, November 14, 2014

Type System in Java

Unified type system. In a unified type system, all types including primitive types inherit from a single root type. For example, C# is unified typed, as every type in C# inherits from the Object class. Java has several primitive types that are not objects. However, Java provides wrapper object types that exist together with the primitive types, so developers can use either the wrapper object types or the simpler non-object primitive types.

Type erasure. (to be completed later)

Generic types. Generics have been available in Java since J2SE 5.0 was released in 2004. Generics allow for parameterized types and hence allow the compiler to enforce type safety. For example, an instance of the Vector class can be declared to be a vector of strings, written as Vector<String>. To avoid major changes to the Java runtime environment, generics are implemented using type erasure. This amounts to removing the additional type information and adding casts wherever required. For example, there is no way to distinguish between a List<String> and a List<Long> at runtime. Since JVM doesn't track the type arguments of a generic class in the bytecode, both of them are of type List at runtime.

Arrays. Java’s arrays are parameterized types written in a form different from the generic types. For example, the array type String[] is analogous to the vector type Vector<String>. On the other hand, Java arrays are not subject to type erasure like generic types are, and this leads to inconsistency in the handling of arrays and generic types. One consequence of this is that the following code will not compile:
class Test<T> {
  public Vector getVector(){ return new Vector<T>(); } // ok
  public T[] getArray(){ return new T[10]; } // compile-time error
}
Since arrays don’t support type erasure, method getArray needs the type information of T to create the array. However, as Java generics are implemented using type erasure, the parameterized type T does not exist at runtime. Hence, the compiler cannot assign a type to the array and thus will raise a "generic array creation" error.

Type parameters. (to be completed later)

Type inference. Java doesn't support type inference for variable assignments. Hence, it requires that programmers declare the types they intend a method or function to use. However, it is not true that Java doesn't have type inference. Indeed, recent versions of Java supports type inference when using generics. In Java 7, one can get some additional type inferencing when instantiating generics like so
Map<String, String> foo = new HashMap<>();
Java is smart enough to fill in the blank angle brackets for us. Moreover, we get type inference in Java 8 as a part of lambda expressions. For example, consider
List<String> names = Arrays.asList("Tom", "Dick", "Harry");
Collections.sort(names, (first, second) -> first.compareTo(second));
The Java compiler can infer from the signatures Collections#sort(List<T>, Comparator<? super T>) and Comparator#compare(T o1, T o2) that first and second should be a String, allowing the programmer to omit the type declarations in the lambda expression.

The reason why Java doesn't allow type inference for non-generic types seems to be because of  its design philosophy. That is, programmers should write things explicitly to make sure that the compiler has the same understanding of the code as them do. Besides, Java was originally aimed at programmers coming from C++, Pascal, or other mainstream languages that did not have it. Thus, type inference was probably not supported due to the principle of least surprise.

Sunday, November 9, 2014

Spark for Beginners

Setup Spark enviornment on the local Ubuntu machine
http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/

Write Spark locally with IntelliJ and running apps on the remote cluster:
http://blog.csdn.net/Camu7s/article/details/45530295

Run Spark apps on Windows without installing Hadoop:
http://qnalist.com/questions/4994960/run-spark-unit-test-on-windows-7

Compile and install Hadoop on Windows:
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows

Install Dato on CentOS 6.4

Dato needs Python 2.7, while CentOS uses Python 2.6. So first you have to install Python 2.7 as an alternative build of Python on your system, as well as the libraries needed to compile Python modules:
sudo yum install -y readline-devel sqlite-devel bzip2-devel.i686 \
openssl-devel.i686 gdbm-devel.i686 libdbi-devel.i686 ncurses-libs 
cd /tmp

# install zlib manually because the default one for centos is too old
wget http://zlib.net/zlib-1.2.8.tar.gz
tar -zxvf zlib-1.2.8.tar.gz
cd zlib-1.2.8
./configure
make & sudo make install

# install Python 2.7.6
wget https://www.python.org/ftp/python/2.7.6/Python-2.7.6.tgz
tar -zxvf Python-2.7.6.tgz
cd Python-2.7.6

# IMPORTANT!
./configure --enable-shared --enable-unicode=ucs4

# IMPORTANT!
make & sudo make altinstall
You also need to install setuptools, pip, and virtualenv before running Dato. The installation steps however are standard, see e.g., here for details.