Having fun with Programming

Friday, December 12, 2014

Smoothsort Demystified

A very nice article explaining smoothsort!

http://www.keithschwarz.com/smoothsort/

Sunday, December 7, 2014

http://msdn.microsoft.com/en-us/library/dn568099.aspx

"This book contains twenty-four design patterns and ten related guidance topics, this guide articulates the benefit of applying patterns by showing how each piece can fit into the big picture of cloud application architectures. It also discusses the benefits and considerations for each pattern. Most of the patterns have code samples or snippets that show how to implement the patterns using the features of Microsoft Azure. However the majority of topics described in this guide are equally relevant to all kinds of distributed systems, whether hosted on Azure or on other cloud platforms."

Friday, November 14, 2014

Type System in Java

Unified type system. In a unified type system, all types including primitive types inherit from a single root type. For example, C# is unified typed, as every type in C# inherits from the Object class. Java has several primitive types that are not objects. However, Java provides wrapper object types that exist together with the primitive types, so developers can use either the wrapper object types or the simpler non-object primitive types.

Type erasure. (to be completed later)

Generic types. Generics have been available in Java since J2SE 5.0 was released in 2004. Generics allow for parameterized types and hence allow the compiler to enforce type safety. For example, an instance of the Vector class can be declared to be a vector of strings, written as Vector<String>. To avoid major changes to the Java runtime environment, generics are implemented using type erasure. This amounts to removing the additional type information and adding casts wherever required. For example, there is no way to distinguish between a List<String> and a List<Long> at runtime. Since JVM doesn't track the type arguments of a generic class in the bytecode, both of them are of type List at runtime.

Arrays. Java’s arrays are parameterized types written in a form different from the generic types. For example, the array type String[] is analogous to the vector type Vector<String>. On the other hand, Java arrays are not subject to type erasure like generic types are, and this leads to inconsistency in the handling of arrays and generic types. One consequence of this is that the following code will not compile:

class Test<T> {
  public Vector getVector(){ return new Vector<T>(); } // ok
  public T[] getArray(){ return new T[10]; } // compile-time error
}

Since arrays don’t support type erasure, method getArray needs the type information of T to create the array. However, as Java generics are implemented using type erasure, the parameterized type T does not exist at runtime. Hence, the compiler cannot assign a type to the array and thus will raise a "generic array creation" error.

Type parameters. (to be completed later)

Type inference. Java doesn't support type inference for variable assignments. Hence, it requires that programmers declare the types they intend a method or function to use. However, it is not true that Java doesn't have type inference. Indeed, recent versions of Java supports type inference when using generics. In Java 7, one can get some additional type inferencing when instantiating generics like so

Map<String, String> foo = new HashMap<>();

Java is smart enough to fill in the blank angle brackets for us. Moreover, we get type inference in Java 8 as a part of lambda expressions. For example, consider

List<String> names = Arrays.asList("Tom", "Dick", "Harry");
Collections.sort(names, (first, second) -> first.compareTo(second));

The Java compiler can infer from the signatures Collections#sort(List<T>, Comparator<? super T>) and Comparator#compare(T o1, T o2) that first and second should be a String, allowing the programmer to omit the type declarations in the lambda expression.

The reason why Java doesn't allow type inference for non-generic types seems to be because of its design philosophy. That is, programmers should write things explicitly to make sure that the compiler has the same understanding of the code as them do. Besides, Java was originally aimed at programmers coming from C++, Pascal, or other mainstream languages that did not have it. Thus, type inference was probably not supported due to the principle of least surprise.

Sunday, November 9, 2014

Spark for Beginners

Setup Spark enviornment on the local Ubuntu machine
http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/

Write Spark locally with IntelliJ and running apps on the remote cluster:
http://blog.csdn.net/Camu7s/article/details/45530295

Run Spark apps on Windows without installing Hadoop:
http://qnalist.com/questions/4994960/run-spark-unit-test-on-windows-7

Compile and install Hadoop on Windows:
http://stackoverflow.com/questions/18630019/running-apache-hadoop-2-1-0-on-windows

Install Dato on CentOS 6.4

Dato needs Python 2.7, while CentOS uses Python 2.6. So first you have to install Python 2.7 as an alternative build of Python on your system, as well as the libraries needed to compile Python modules:

sudo yum install -y readline-devel sqlite-devel bzip2-devel.i686 \
openssl-devel.i686 gdbm-devel.i686 libdbi-devel.i686 ncurses-libs 
cd /tmp

# install zlib manually because the default one for centos is too old
wget http://zlib.net/zlib-1.2.8.tar.gz
tar -zxvf zlib-1.2.8.tar.gz
cd zlib-1.2.8
./configure
make & sudo make install

# install Python 2.7.6
wget https://www.python.org/ftp/python/2.7.6/Python-2.7.6.tgz
tar -zxvf Python-2.7.6.tgz
cd Python-2.7.6

# IMPORTANT!
./configure --enable-shared --enable-unicode=ucs4

# IMPORTANT!
make & sudo make altinstall

You also need to install setuptools, pip, and virtualenv before running Dato. The installation steps however are standard, see e.g., here for details.

Saturday, October 11, 2014

Combiners in Hadoop MapReduce

A combiner function in MapReduce has the same form as the reduce function (and is an implementation of the Reducer interface), except its output types are the intermediate key and value types (K₂ and V₂), so they can feed the reduce function:

combiner: (K₂, list(V₂)) → list(K₂, V₂)

Often the combiner and reduce functions are the same, in which case K₃ is the same as K₂, and V₃ is the same as V₂. On the other hand, Hadoop reserves the right to use combiners at its discretion. This means that a combiner may be invoked zero, one, or multiple times. Hence, the correctness of a MapReduce algorithm shouldn't depend on computations performed by the combiner or depend on them even being run at all.

Sunday, October 5, 2014

Guidance for Scientific Writing

The Structure, Format, Content, and Style of a Journal-Style Scientific Paper

Typesetting mathematics for science and technology according to ISO 31/XI

According to the standard, constants like i and Euler's e, and the differential operator d should be upright. However, these typesetting rules seem to be ignored by many respected authors and publishers. See this thread for some interesting discussions on this issue: What's the proper way to typeset a differential operator? For those looking for an easy solution, one of the re-posts there suggests to typeset a math paper in Latex using package commath.