Hadoop Summit San Jose is the premier event catering for business and technical audiences where you can learn about the technologies and business drivers that are transforming big data. All sessions are by pragmatists and experts in their fields that have been vetted by the community. Attendees will develop an understanding of key technologies powering new modern data applications and the value they generate for businesses. Industry experts, business leaders, architects, data scientists and Hadoop developers will share use cases, success stories, best practices, cautionary tales and technology insights that will provide practical guidance to novices as well as experienced practitioners of modern data infrastructure.
There are two Major tracks with 9 topic areas:
Business Track - Agenda now live
Big data is transforming business and modern data applications are creating new opportunities and better business outcomes. In this track you will learn from business leaders and innovators and understand the business benefits, challenges and secrets to success around their transformation. Speakers are from different companies across industries and geographies, but they have one thing in common, they are leveraging data and open source technology for amazing business outcomes. This track is focused on the journey enterprises are experiencing as they move from legacy data stores to big data. Sessions will cover ROI, business benefits, and success criteria as well as hard-fought lessons learned in their journey.
This track is focused on sharing business best practices, common roadblocks and enterprise solutions towards becoming a data-driven business.
The open innovation in Hadoop has transformed the enterprise from corporate IT to the line of business. In this track you will learn from the technical innovators at the forefront of this transformation and understand the what, why and how around this transformation from a technical and business perspective. Speakers will discuss tools, techniques, and solutions for deriving business value and competitive advantage from large volumes of data flowing through today’s enterprise. Sessions will cover case studies and tips for effective exploration of business data, visualization and solutions driving data driven enterprises.
Any application or project that is in, incubating, or emerging from the extended Apache Hadoop ecosystem. Priority will be given to applications that are in production.
Technology Track - Agenda now live
Hear about the latest innovation within the Hadoop ecosystem from the community architecting and building Hadoop - the committers. These are the engineers and developers who lead the innovation in open source projects and can provide an insider's perspective. This track presents technical deep dives across a wide range of Apache topics and projects.
Any project in incubation or emerging from the extended Apache Hadoop ecosystem.
Insights from the data lake drives business innovation. Leveraging Hadoop for analytics is a key use case across industries and represents a critical value proposition for Hadoop. This track will include introductory to advanced sessions on applications, tools, algorithms and emerging research topics that extend Hadoop platform for data science. Sessions will include examples of innovative analytics applications and systems, data visualization, statistics and machine learning. You will hear from leading data scientists, analysts and practitioners who are driving innovation by extracting valuable insights from data at rest as well as data in motion.
This track will cover practical advice and best practices for data applications in Hadoop, R, Python, Spark ML, SparkR, Spark SQL, Spark GraphX, Zeppelin, Magellan, and others.
With the growing volumes of diverse data being stored in the Data Lake, any breach of this enterprise-wide data can be catastrophic, from privacy violations and regulatory infractions to corporate image and long-term shareholder value. This track focuses on the key enterprise requirements for governance, security and operations for the extended Hadoop ecosystem. As Hadoop emerges as a critical foundation of a modern data application, the enterprise has placed stringent requirements on it for these key areas. Speakers will present best practices with an emphasis on tips, tricks, and war stories on how to secure the Hadoop infrastructure. Sessions will cover full deployment lifecycle for on-premise and cloud deployments, including installation, configuration, initial production deployment, recovery, security, and data governance for Hadoop.
This track will cover security and data governance technologies including: Atlas, Falcon, Ranger, Knox, NiFi Security & Governance, and others.
For a system to be "open for business", it must be efficiently managed by system administrators. This track covers the core practices and patterns for planning, deploying, and managing Hadoop cluster from on-premise to cloud. We will also cover best practices for loading, moving, and managing data workflows. Sessions will range from how to get started, operating your cluster to cutting edge best practices for large-scale deployments.
Sample technologies that could be found in this track include: Ambari, Cloudbreak, Azure, GCE, AWS, Openstack, Docker Hadoop as a Service, Oozie, Sqoop, Flume, and others.
YARN has transformed Hadoop into a multi-tenant data platform. It is the foundation for a wide range of processing engines that empowers businesses to interact with the same data in multiple ways simultaneously. This means applications can interact with the data in the most appropriate way: from batch to interactive SQL or low latency access with NoSQL. You will have the opportunity to hear from the rock stars of the Hadoop community and learn how these innovators are building applications. You can then take that knowledge back to your own app projects.
Sample technologies that could be found in this track include: Core Spark, Spark SQL, Pig, Hive, Tez, ORC, HBase, Phoenix, Accumulo, Solr, Flink, Cascading, .net, Spring and others.
Hadoop continues to drive innovation at a rapid pace and the next generation of Hadoop is being built today. This track showcases new developments in core Hadoop and closely related technologies. Attendees will hear about key projects, such as HDFS and YARN, projects in incubation and the industry initiatives driving innovation in and around the Hadoop platform. Attendees will interact with technical leads, committers, and expert users who are actively driving the roadmaps, key features, and advanced technology research around what is coming next for the Hadoop ecosystem.
Sample technologies that can be found in this track include: YARN, Slider, HDFS, Capacity Scheduler and others.
The increase in the number of sensors and connected devices is fueling data growth in Hadoop. The speed with which enterprises can make decisions based on data is critical to their competitive advantage. This track covers the state of the art in IoT, including managing devices at the “jagged edge”, strategies and practices for data ingestion and analysis, and best practices for deriving real-time actionable insights as the data flows from connected devices into Hadoop infrastructure. Attendees will hear from the technical leads, committers, and expert users who are actively driving the roadmaps and key features in IoT emerging technologies. Attendees will also learn how to use these technologies to developer IoT solutions.
This track will discuss use cases and integration points for key projects such as: NiFi, Storm, Kafka, Spark Streaming, Solr and others.
Apache Committer Insights
Andy Feng - VP Architecture, Yahoo!
Andy Feng is a VP Architecture at Yahoo! leading the architecture and design of big data and machine learning initiatives. He architected major platforms for personalization, ads serving, NoSQL, and cloud infrastructure. Andy is a PPMC member, Committer of the Apache Storm project and a Contributor to the Apache Spark project. He has been a track chair for Hadoop Summit annually since 2013. Prior to Yahoo, Andy was a Chief Architect at AOL/Netscape, and Principal Scientist at Xerox.
Data Science, Analytics and Spark
Ram Sriharsha - Product Manager Apache Spark, Databricks
Ram Sriharsha is the Architect for Spark and Data Science at Hortonworks. Ram is an Apache Spark Committer and PMC Member. Prior to joining Hortonworks, he was Principal Research Scientist at Yahoo Research where he worked on large scale machine learning algorithms and systems related to login risk detection, sponsored search advertising and advertising effectiveness measurement.
Governance and Security
Seshu Adunuthula - Head of Analytics Infrastructure, eBay
Seasoned Software Professional specializing in Enterprise Software with a focus on Middleware and Business Intelligence. I am currently interested in finding innovative ways to adopt Business Analytics and Reporting to the Cloud with large unstructructer Data sets.
Cloud and Operations
Cindy Gross – Big Data AzureCAT, Microsoft
Microsoft AzureCAT Cindy Gross leads customers through Azure-based Big Data engagements. She helps customers solve business problems with E2E solutions that include distributed data processing, Hadoop, HDInsight, Spark, Azure Data Lake, & the Internet of Things on Azure. Cindy is a SQL Server Microsoft Certified Master who loves to share knowledge through presentations, technical articles & blogs.
James Taylor - Architect, Salesforce
James Taylor is an architect at Salesforce in the Data Platform and Services Cloud. He leads the Apache Phoenix project, an OLTP database for Hadoop, and is a PMC member of Apache Calcite and the Apache Incubator. Prior to working at Salesforce, James worked at BEA Systems on projects such as federated query processing systems and event driven programming platforms and has worked at various other start-ups in the computer industry over the past 20 years.
Future of Apache Hadoop
Sanjay Radia - Founder and Architect, Hortonworks
Sanjay is founder and architect at Hortonworks. Sanjay is an Apache Hadoop committer and member of the Apache Hadoop PMC. Prior to co-founding Hortonworks, Sanjay was the chief architect of core-Hadoop at Yahoo and part of the team that created Hadoop. In Hadoop he has focused mostly on HDFS, MapReduce schedulers, high availability, compatibility, etc. He has also held senior engineering positions at Sun Microsystems and INRIA, where he developed software for distributed systems and grid/utility computing infrastructures. Sanjay has a PhD in Computer Science from the University of Waterloo in Canada.
Modern Data Applications
Jim Walker - Vice President of Marketing, EverString
Jim has nearly twenty years experience building products and developing emerging technologies. During his career, he has brought multiple products to market in a variety of fields, including data loss prevention, master data management (MDM), Hadoop and now Predictive Analytics. Jim specializes in open source business models and is focused on accelerating the development and adoption of Predictive Analytics to augment the Sales & Marketing functions.
IoT and Streaming
Gearóid O’Brien – Distinguished Engineer, Neustar
Gearóid O'Brien is a Distinguished Engineer at Neustar, a trusted, neutral provider of real-time information services. Based in San Francisco, Gearóid was the founding data scientist at Datasnap.io, a company that delivered insight from proximity triggered customer engagement. Datasnap.io was acquired by Neustar in 2015, and he now leads Data Analytics at Neustar for the "Internet of Things” team. He holds a PhD in Electrical Engineering from Stanford University, focused on demand side management of energy consumption using tools from data mining, machine learning, and statistics.