English | April 23rd, 2019 | ISBN: 183864413X | 182 pages | 5.36 MB
Copyright © 2019 Packt Publishing
Authors:
- Colibri Digital is a technology consultancy company founded in 2015 by James Cross and Ingrid Funie.
- Rudy Lai is the founder of QuantCopy, a sales acceleration start-up using AI to write sales emails to prospective customers.
- Bartłomiej Potaczek is a software engineer working for Schibsted Tech
What you will learn
• Get practical big data experience while working on messy datasets
• Analyze patterns with Spark SQL to improve your business intelligence
• Use PySpark’s interactive shell to speed up development time
• Create highly concurrent Spark programs by leveraging immutability
• Discover ways to avoid the most expensive operation in the Spark API: the shuffle operation
• Re-design your jobs to use reduceByKey instead of groupBy
• Create robust processing pipelines by testing Apache Spark jobs
Link download ebook