Slides
https://github.com/keypointt/reading/blob/master/spark/2017_spark_spring_SF_SalesForce_scaling.pdf
note
Yes it’s more of a commercial, but still I like it.
The architecture is heavily on AWS with s3, and at scale, whether it’s ETL or model training, it’s always paralleled and distributed. More memory, more hard drive, and better CPU.
Reference: