one day, many perspectives, millions of new ideas

Updated schedule: 6th Data Science Day!

On your marks – get set – go!!!!!

Registration started for our 6th Data Science on “Data-Driven Decisioning – an era of smart machines and smarter businesses!”

Believe me – this is one of the best line-ups of topics & speakers we ever had…….so be quick again… always: first come, first serve!


UPDATE: Sorry, but we are already booked out! After 2 hours! If you wanna be on our waiting list, pls. send an email to klaas.bollhoefer(at)

See what we are up to….

Schedule Data Science Day:

JUST IN: We might have one really famous guest speaker from the US attending! Keep you posted!

9:00 Doors open & breakfast

9:45 – 10:00
Klaas Bollhoefer, The unbelievable Machine Company & Steven Lemm, Zalando: “Welcome!”

10:00 – 10:45
Drew Conway, Project Florida: “Will work for data hard problems”
The narrative for the career arc of data scientists is broken.  But, this is a great thing for both those interested in entering this field, and those that want to attract talent. In this talk I will discuss how both suppliers and consumers of data science talent can leverage what they care about most — hard problems — into a better career or better business. The pursuit of unicorns is best left to fantasy writers. Here’s how to work with the next generation of data scientists.

Bio: Drew Conway is a leading expert in the application of computational methods to social and behavioral problems at large-scale. Drew is the is the Head of Data at Project Florida, and has been writing and speaking about the role of data — and the discipline of data science — in industry, government, and academia for several years. Drew has advised and consulted companies across many industries; ranging from fledgling start-ups to Fortune 100 companies, as well as academic institutions and government agencies at all levels. Drew started his career in counter-terrorism as a computational social scientist in the U.S. intelligence community.

10:45 – 11:15
Andrew Cantino: “Huginn: Your Agents Are Standing By”
The landscape of personal automation is evolving rapidly. Tools such as Yahoo! Pipes, IFTTT, and Zapier allow us to link and control the many systems in which we communicate and store our data. Unfortunately, these tools are proprietary and closed, limiting expressiveness and ultimately leaving our data in someone else’s hands.  We can do better. Huginn is a free and open-source system for building personal agents that perform automated tasks for you online. Huginn’s agents can read the web, consume data feeds, watch the news, and take actions on your behalf. In this talk I will introduce the Huginn project, show how it can be used through some concrete examples, and then talk about the emerging field of smart agents and personal automation in general. This field includes declarative systems like Huginn, predictive systems like Google Now, and assistive systems such as Siri.

Just as automation in the 19th century revolutionized manufacturing, personal automation is positioned to revolutionize our interactions with the world. Our automated agents will keep a watchful eye, react on our behalf, and perform repetitive tasks, thus allowing us to synthesize and prioritize the ever-growing data around us.

Bio: Andrew Cantino is a programmer, startup technical manager, and open source software developer with a background in physics and machine learning. He is the VP of Engineering at Mavenlink, a startup bringing advanced project management and ERP to small businesses. He has been working on Huginn since late 2012.  To learn more, visit and follow Andrew on Twitter at @tectonic.

11:15 – 11:30 Coffee Break

11:30 – 12:00 
Sebastian Welter, IBM: The era of cognitive computing – IBM Watson
IBM Watson represents a first step into cognitive systems, a new era of computing. Watson builds on the current era of programmatic computing but differs in significant ways. A combination of capabilities makes Watson unique, including natural language processing, hypothesis generation and evaluation, and dynamic learning. Although none of these capabilities alone are unique to Watson, the combination delivers a powerful solution to move beyond the constraints of programmatic computing, to move from reliance on structured, local data to unlock the world of global, unstructured data, to move from decision tree-driven, deterministic applications to probabilistic systems that co-evolve with their users, and to move from keyword-based search that provides a list of locations where an answer might (or might not) be located, to an intuitive, conversational means of discovering a set of confidence-ranked responses.

Bio: Sebastian Welter is a Client Technical Architect and the technical software representative on IBM’s team for Germany, Austria and Switzerland. Active in this role since 2011, his responsibilities include Content Analytics and Content Management systems – everything that covers unstructured data, from managing to analyzing and storing.  His focus is on high-volume archives and Big-Data systems in unstructured data environments, as well as analysis of natural-language texts.

12:00 – 12:30
Roland Vollgraf, Zalando: “Machine learned Weight Watching”
It is of general interest for Zalando to know the weights of articles in stock. Although some of Zalando’s articles had been manually weighed, we were in the dark for the majority of items, since explicit weighing is expensive and time consuming. To estimate the weight for all the articles in Zalando’s catalogue, we had at our disposal the measurements that had already been made, and the weights of all customer bound packages that had been sent out of one of the warehouses. With this data, we developed a method that estimates a weight distribution for each article in the warehouse. The method has proven to be highly accurate, despite multiple error sources such as faulty scales, varying filling and packing materials, and occasional packages with incorrect items. In our talk, we will present how we got these accurate results, and where we are heading next.

12:30 – 1:00 Lunch Break

1:00 – 1:15
Impulse Talk by David Ho, Contact Singapore: Contact Singapore and Opportunities in the Data Analytics sector in Singapore
Contact Singapore is an alliance of the Singapore Economic Development Board and Ministry of Manpower. We engage overseas Singaporeans and global talent to work, invest and live in Singapore. We also actively link Singapore-based employers with professionals to support the growth of our key industries. We work with investors to realise their business and investment interests in Singapore. ‘Big data’ has arrived and is poised to become a key part of the Singaporean economy. The huge potential for big data is only just being uncovered, and experts predict the sector is destined to get much, much larger in a relatively short space of time. Last year, companies worldwide spent US$4.3 billion on software for big data projects. By 2016, spend on global big data technology could top US$23.8 billion. Being at the heart of Asia, Singapore’s location is seen as providing the perfect setting for companies looking to understand Asian consumers. With its culturally diverse population, the Red Dot offers the ideal location to test new made-for-Asia innovations, and a broad economy provides access to industry-specific knowledge crucial for coming up with the best analytics.

1:15 – 1:45 
Mikio Braun, Streamdrill: Analyzing Big Data Stream in Real-Time
When it comes to real time big data it’s less about analysis than making your insights work. However, big data approaches which depend on parallelization alone usually require significant resource investments and setup costs to get up to speed. Streamdrill follows a different approach based on intelligent data management which focusses on the relevant data, allowing you to get started with one server. Modules for standard applications like profiling, trending, or recommendation further speed up the deployment process. We’ll present streamdrill and discuss some use cases and customer stories.

1:45 – 2:15
Michael Hausenblas: Apache Spark—the light at the end of the tunnel?
In this talk we will provide an introduction to Apache Spark ( a distributed computing framework that recently graduated to a top-level Apache project. Spark offers HDFS integration but it not limited to Hadoop MapReduce, enabling in-memory cluster computing as well as stream processing workloads. Originally developed at UC Berkeley’s AMPLab, in early 2013 a number of academics behind the open source project decided to spin out a company called Databricks and raised some $14 mm to commercialise the product. We will have a look at Spark in action, contrast it with other solutions in areas incl. SQL on Hadoop, stream processing, and machine learning and discuss upcoming developments around Spark.

Bio: Michael is MapR’s Chief Data Engineer, where he’s helping people to tap the potential of Big Data by bridging the technical (reliability, scalability, etc.) and the business side (RoI, TCO, etc.). His background is in large-scale data integration, the Internet of Things, and Web applications and he’s experienced in advocacy and standardisation. Michael has been using NoSQL datastores and Hadoop ecosystem components since 2008 and nowadays he’s sharing his experience with the Lambda Architecture and distributed systems through blog posts and public speaking engagements.  Last but not least, Michael is contributing to Apache Drill, a distributed system for interactive, ad-hoc analysis and query of large-scale datasets.

2:15 – 2:45 
Trent McConaghy: Artificial Intelligence and the Future of Cognitive Enhancement
We may marvel at our brains. But let’s be honest: they’re imprecise, they run slow, they learn slowly, and they forget. We may also marvel at how silicon-based computing and artificial intelligence (AI) often outperform our brains. But that has its own problems, especially: we’re not silicon. We’d prefer a future that includes humans over just silicon-based AI. Fortunately, the future doesn’t need to be “either-or”; the answer is cognitive enhancement, which can leverage our brains and silicon. Cognitive enhancement’s history ranges from abacii to maps to Twitter; with each invention enabling us to think, remember, or communicate better. The next 50 years will dwarf all past improvements. Today’s industrial state-of-the-art provide a glimpse into this future: powerful AI-based computing plus natural interfaces enables engineers to quickly design radically complex systems, like computer chips with a billion transistors using extremely unreliable components. This state-of-the-art in industry will distribute to the rest of us in our daily lives. Through AI-based computing and novel interfaces, the tips of our brains will have access to convenient, fast, reliable, high-density computation, memory, and communication. These will not be tiny improvements, they will be order-of-magnitude changes that will radically alter our experience as humans. It’ll be fun!

Bio: Trent McConaghy is an engineer, entrepreneur, scientist, and author. He specializes in building industrial tools for designing computer chips, which leverage powerful AI-based computing plus natural interfaces. He is co-founder and CTO of Solido Design Automation, which has Nvidia, Qualcomm, and several more major chip companies as customers. While still in his twenties, he built and sold his first company, Analog Design Automation, to billion-dollar industry leader Synopsys. He has a PhD from KU Leuven, Belgium. He has written twenty patents, and two books on artificial intelligence and its relation to creativity and to reliable design. He has given invited talks at MIT, Berkeley, Jet Propulsion Lab, and many conferences on artificial intelligence and computer chips. Trent was raised on a farm in Saskatchewan, Canada.

2:45 – 3:15 Coffee Break

3:15 – 3:30 
Klaas Bollhoefer, The unbelievable Machine Company: “Wrap-up & Workshop planning”

3:30 – 5:00
Workshop Sessions & Product Demos

This time we have 3 workshops / discussions groups to choose from….

Workshop I: Realtime Data Stream Processing & Co. 

Wilfrid Hoge & Stephan Reimann of IBM will talk about:
Take action on sensor data in real-time based on analytics in R – with live demo

Machine and sensor data generated by the Internet of Things, sensors and computer processes provide a fantastic source for a completely new generation of Big Data applications. Imagine the possibilities if you apply machine learning and predictive analytics in real time to continuously optimize a large number of connected devices and the application to current challenges such as traffic, health or ecology. Starting with a short overview about the use cases and the important general concepts for real time streaming analytics, the following live demo will give a practical insight into its realization. The live demo will demonstrate how R models are applied to real time data feeds using InfoSphere Streams, and will also cover advanced concepts such as dynamic model update.

After that Michael Hausenblas (MapR, Spark) & Mikio Braun (Streamdrill) will join and open up the discussion to give you a full-force head-dive into data stream processing and together will definitely be able to answer every single question you might have….

Workshop II: Data markets, Open Data, Wikidata & Co.

This workshop will start with two impulse talks before opening up the discussion….

Wikidata is a project by Wikimedia Deutschland with the goal to create a multilingual data repository for Wikipedia and the world. It aims to be Wikimedia Commons for data, allowing Wikipedia editors to put factual information like the population of a city in one central database, instead of having to maintain it as text in dozens or hundreds of languages. The talk will give an overview of the software architecture of Wikidata and how it ties in Wikipedia. The talk will address the many technical and conceptual challenges that arise from the complexity and scale of the data. Among other things, I will describe how data records are transcluded between wikis, and how changes are recorded and propagated throughout the system. Another topic of the talk will be the data model. Wikidata can not only record statements, but also information about provenance, scope and accuracy, thus reflecting the diversity of knowledge available and supporting the notion of verifiability.

Bio: Anja Jentzsch is a PhD student at the Information Systems Group at Hasso-Plattner-Institute Potsdam and a member of the Open Knowledge Foundation Germany. She is a Linked Data enthusiast, being involved in several Linked Data projects since 2007. She is and has been working on Wikidata, DBpedia (Wikipedia as Linked Data), Silk (interlinking Linked Data sets) and LODD (Linking Open Drug Data).

MIA ‒ a Marketplace for Information and Analyses MIA is a cloud-based software platform that hosts different data sources as well as algorithms that can be applied on the data. The biggest data source that we offer is a crawl of the German-speaking Web. At the moment, the crawl covers about half a billion web pages and is constantly updated. Conceptionally a subset of the Web crawl, are two other data sources: a collection of aggregated German news from over thousand online media sources, which has been collected since 2008, and a collection of social media sources, in particular Web forums, provided by our partner VICO Research & Consulting. These data sources can be analyzed using a wide range of algorithms, which are integrated into the platform and are readily available. The algorithms have a strong focus on natural language processing, albeit they are not restricted to it. Using these algorithms, text documents from any of the data sources can be analyzed under different aspects, as for instance which entities from the real world are mentioned (with disambiguated references to the Freebase knowledge base), which facts and relations are reported or which sentiments are expressed by the author. Users can analyze and combine the data using any of the algorithms with MiaQL, an SQL92 based formal query language. MiaQL supports aggregation and joins of data, it allows users to access different data sources, including private data sets, and to call user-defined functions on the data. Finally, the MIA platform allows users to offer their data sets and algorithms on the MIA marketplace and thus allow data providers and algorithm developers to participate in the rising data economy.

Bio: Peter Adolphs is the project manager of MIA at Neofonie. He studied computer science and linguistics at the Humboldt University in Berlin and worked for 6 years in the German Research Centre for Artificial Intelligence (DFKI). His focus of work is the development and improvement of information access and knowledge management solutions by leveraging linguistic analyses at various levels and applying them to large amounts of data, for arriving at an application-specific semantics with minimalized human efforts, using Information Extraction, Question Answering and Semantic Web technologies, aided by Machine Learning.

Workshop III: Deep artificial cognitive and enhanced agents are standing by!

An interactive session with Andrew Cantino, Trent McConaghy & others. :-)

From 5:00 – open end 

Get-together with beer, pizza & lots of time to meet, share ideas & talk about the day!

A big thanks to our sponsors:

Zalando, HP, IBM, Contact Singapore, MapR, T-Systems & The unbelievable Machine Company GmbH (*um)

Enough food & drinks available all day. Talks and presentations are in english!


See ya there – awesome line-up again…..


Post comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


This entry was posted on April 2, 2014 by .

Partnering with:

Big Data Week

Organized by:

Zalando AG

Sponsors & supporters:


Media partner (so far)

%d bloggers like this: