Slides

https://github.com/keypointt/reading/blob/master/spark/2017_spark_spring_SF_IBM_dataframe.pdf

note

1. How DF, DS, and RDD Work

bucketing use case

SPARK-19008 enables generated code to use int value

  • Avoid boxing/unboxing overhead when a Dataset program calls a lambda, which operates on a primitive type, written in Scala.
  • In such a case, Catalyst can directly call a method <primitiveType> apply(<primitiveType>); instead of Object apply(Object);.
  • PR: https://github.com/apache/spark/pull/17172

SPARK-14083 will allow future Spark to understand Java byte code lambda expressions and to combine them

  1. related paper: Jimple: Simplifying Java Bytecode for Analyses and Transformations (1998)
    • http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.7708
  2. Watson Libraries for Analysis (WALA) provide static analysis capabilities for Java bytecode and related languages and for JavaScript

Reference: