Scala Meetup: Traceable Data Entities with Spark 2.x

One of the most common abstraction for a big data platform is a “Data Lake”. Data is brought into the lake, then it’s filtered, parsed, transformed and in the process many more data assets are created. Metadata describes the data and with the growing amount of data, it is becoming more and more important and harder to properly describe the data, the schema and the data lineage.

Spark (2.x) is one of the most important tool in data engineer’s/scientist's toolbox, but currently it offers very little help how to connect the input data sources the output data sources.

At Sqooba they decided to extend spark’s built-in event mechanism to get more granular data about when data is used as an input or output for a spark application and that allows us to use event listeners to update the relevant entities on Apache Atlas to get real time Data Lineage and Metadata.

Bern Scala User Group

News

September 2021

Another successfull collaboration

Tegonal worked together on the last months with EXTRAMET to release a digital solution for their order management system. […]

August 2021

Tegonal is looking for you!

We are looking for a new team member who shares the enthusiasm […]

June 2021

Sensekiste celebrates its 1st anniversary

Nearly 4000 boxes were delivered within one year with the help of OpenOlitor […]

February 2021

120000 baskets distributed via OpenOlitor!

In 2020 OpenOlitor was used for the successful delivery of more than 120000 baskets […]

January 2021

OSS Support 2020

Supporting 10 OpenSource projects! […]

October 2020

What about SwissCovid App and data protection?

Should you install the app on your smartphone? We think: Yes, definitely! […]

January 2021

OSS Support 2020

July 2020

DNS over TLS

June 2020

New partners

May 2020

PhaenoNet

November 2019

15 Jahre Tegonal!

September 2018

Scala Days 2018

August 2018

Scala Meetup

August 2018

Clojure

April 2018

Neue Webseite!

News Archive