one day, many perspectives, millions of new ideas

DSDay7 – Booked out! Waiting list open….

Alright – here we are again! :-)

Our 7th Data Science Day is coming up – October 30 in Berlin – and we´ll (again) have an awesome list of topics & speakers, this time all focussing on “What´s next in Data Science?”

REGISTRATION: Crazy, but we are already booked out! After just 2 hours (again)!
Waiting list: Just send an email to klaas.bollhoefer(at)

DATE/TIME: Oct 30, from 9am til 9pm

LOCATION: New Zalando offices (thanks for your support again!) @ Tamara-Danz-Straße 1 (close O2 Arena, Warschauer Str.), 10243 Berlin

So…what are we up to this time….


Doors open & breakfast

9:45 – 10:00
Klaas Bollhoefer, The unbelievable Machine Company & Steven Lemm, Zalando: “Welcome!”

10:00 – 10:40
Dirk Groeneveld, Allen Institute for Artificial Intelligence in Seattle: “
Mining Data where Classification can’t Reach
Impressive advances in machine learning have yielded big benefits to practitioners and to society as a whole, but the focus on better classification and prediction is leaving entire classes of problems unaddressed. In this talk I’ll be exploring some of these areas, and introduce the approach that we’re using at AI2 to make progress.

10:40 – 11:00
Sven Fessler, IBM: “Watson Analytics.- The Next Wave of Data and Analytics Services on the Cloud

The presentation will cover IBM Project of Watson Analytics which is a cloud based natural language-based cognitive service that will enable business professionals of any industry and skill level to instantly access and use powerful predictive and visual analytic tools to help them make better business decisions.  I will show that the solution offers that advanced analytics capabilities that have been the mainstay of data scientists. Watson Analytics removes the complexity and provides a user-friendly tool for professionals that makes it easier for them to use data to find answers and insights. With an entirely new class of cognitive capabilities, people can ask questions and get answers in natural language. With the the Solution, anyone will be enabled to get refined, trusted data that helps to discover insights, predict outcomes, visualize results and create simple, compelling reports.

11:00 – 11:30
Alexander Kagoshima, Pivotal: “Real-Time Journey Prediction From Car Sensor Data”

Intelligent prediction of driving behaviour has a wide range of applications that span from optimising fuel efficiency to avoiding traffic. At Pivotal, we have developed a prototype framework that collects real-time sensor data from drivers and uses Machine Learning techniques to build up a picture of the driver behaviour from it. The derived models then make range and route predictions tailored to individual drivers in real-time. For our prototype, we use Bluetooth dongles that connect to standard OBD II car sensor data ports. Together with a self-developed iOS app we can then stream this OBD II data into our framework’s big data infrastructure for long term storage, batch training processes and subsequent real-time analysis. In addition, we created a dashboard that shows the results of our real-time prediction during drive time. We will show how we used different Open-Source technologies to stream, store and reason over data in a scalable way while facilitating a real-time dashboard on top of processed information. In particular, we will focus on how we designed the Machine Learning framework to derive individual driver ‘fingerprints’ from variables such as speed, acceleration, driving times and location, taken from historical data. These fingerprints are then used within the real-time prediction framework to determine final journey destination and driving behaviour in real-time during the journey.

11:30 – 11:40
Lucia Santamaria, komoot: “My favorite algorithm”

11:40 – 12:00 Coffee Break

12:00 – 12:30
Prof. Dr. Christian Bauckhage, Fraunhofer IAIS: “Collective Attention on the Web”

The problem of understanding the dynamics of collective attention has been identified as a key scientific challenge for the information age. In this talk, we explore aspects of this problem and look at how search behaviors of large populations of Internet users evolve over time. We observe highly regular patterns that persists across countries, cultures, and topics. We interpret the empirical data in terms of psychological mechanisms and point out that, using data mining techniques, the collective interests of Web users appear to be highly predictable.

12:30 – 13:00
Torben Brodt, plista: “Latest in large scale recommendation engines and machine learning”

Torben will be coming right from the Recommender Systems conference in San Francisco. He will present some latest developments in the field of large scale recommendation engines and machine learning.

13:00 – 14:00 Lunch break

14:00 – 14:10
Fabian Hadiji, “my favorite algorithm”

14:10 – 14:40
Georg Urban, Microsoft; 
Dr. Artus Krohn-Grimberghe, Lytiq GmbH: “Paul the Octopus vs. Bing Prediction Engine:  Data Science @Microsoft”
Microsoft has been using Data Science for years now in its own biotope. Products like Xbox, Bing, Cortana or Azure wouldn’t be possible without massively doing data science. With the advent of Big Data, Internet of Things and the rapidly growing interest in advanced analytics we are opening our labs and let the beasts out. In our session we give a brief overview about what we are actually doing (use cases) and what our toolbox looks like.

14:40 – 15:10
János Moldvay, Jimdo: “Building Data Science Teams: use cases and lessons learned from building data science teams at three different companies”

This talk will be on a number of different data science use cases from four different businesses. It will specifically look at lessons learned regarding maximizing business impact and building data science teams.

15:10 – 16:00
Roland Memisevic, University of Montreal “Latest in Deep Learning”

Deep Learning is the name of the current (and third) wave of scientific interest in neural networks within the last 60 years. I will describe why neural networks are suddenly hot again, and why this time they are likely to stay here for a very long time. I will also describe some of the latest research directions, and some of the modern software tools and computational tricks for dealing with difficult nonlinear optimization problems, which make deep networks so unruly (and interesting).

16:00 – 16:30 Coffee Break

16:30 – 16:35
Klaas Bollhoefer, The unbelievable Machine Company: “Wrap-up & Workshop planning”

16:35 – 18:00
Workshop Sessions & Product Demos

This time we have 3 workshops / discussions groups to choose from….

Workshop I: Hands-On Apache Drill
Hadoop opened the world to the ease of distributed processing of large quantities of data. Although it has powerful features, sometimes it’s hard to crunch and combine multiple data sources because they require (often slow) ETL processes. This was partially solved with Apache Hive, with its schema-on-creation flexibility. This approach allows the specification on how Hadoop should read the contents of a set of files. Apache Drill raises the bar by being ANSI SQL-compliant and truly schema-on-read. In this session there will be a hands-on demo on Drill’s capabilities and show how can one perform ad-hoc querying, combining data from multiple sources in different formats.

Workshop II: Hands-On Microsoft Azure ML – the new cloud-based Machine Learning platform by Microsoft
Azure ML is the brand new and fully cloud-based Data Science and Machine Learning environment from Microsoft. This break-out session will focus on a Data Scientist’s daily tasks and how they can be executed in Azure ML. We will start with pre-processing and exploratory data analysis and continue to dive into predictive models, ensembles, their evaluation, and how they can be put into production and be consumed by applications from all over the world.

Workshop III: Discussion with the speakers

From 18:00 – 21:00
Get-together with beer, pizza & lots of time to meet, share ideas & talk about the day!

Last but not least I really want you to thank our fantastic sponsors. This time: Zalando, Microsoft, IBM, MapR & The unbelievable Machine Company GmbH. Without them we wouldn´t be able to make this happen! So – THANKS!

So – see you in Berlin, again and/or soon!

IBM LOGO blue_r

mslogo (1)





Post comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


This entry was posted on October 2, 2014 by .

Partnering with:

Big Data Week

Organized by:

Zalando AG

Sponsors & supporters:


Media partner (so far)

%d bloggers like this: