Slides
https://github.com/keypointt/reading/blob/master/spark/2017_spark_spring_SF_IBM_dataframe.pdf
note
1. How DF, DS, and RDD Work
2. related ticket: SPARK-19008
SPARK-19008 enables generated code to use int value
- Avoid boxing/unboxing overhead when a Dataset program calls a lambda, which operates on a primitive type, written in Scala.
- In such a case, Catalyst can directly call a method
<primitiveType> apply(<primitiveType>);
instead ofObject apply(Object);
. - PR: https://github.com/apache/spark/pull/17172
3. related ticket: SPARK-14083 (still open)
SPARK-14083 will allow future Spark to understand Java byte code lambda expressions and to combine them
- related paper: Jimple: Simplifying Java Bytecode for Analyses and Transformations (1998)
- http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.7708
- Watson Libraries for Analysis (WALA) provide static analysis capabilities for Java bytecode and related languages and for JavaScript
Reference: