SATRDAY

SatRDays are community-led, regional conferences to support collaboration, networking and innovation within the R community. BelgradeR User Group collaborates with R-powered data businesses from Serbia and beyond to provide a fantastic day of R learning, networking and workshops.

satRday Belgrade 2018 will be held on Saturday October 27th 2018 at Museum of Science and Technology in Belgrade.

Registration

More participants thanks to online event management solutions from XING Events.

Speakers

Keynote speakers

Mariachiara Fortuna
Data Scientist at Quantide

Speakers

Filip Rodik
Data engineer & Hacktivist (CRO)
Code For Croatia / Gong
Dragana Radojicic
Teaching Assistant (AUT)
TU Vienna
Peter Laurinec
Data Scientist (SVK)
Powerex
Marcin Kosinski
Statistician (POL)
Gradient Metrics
Radmila Velickovic
Student (SRB)
University of Neuchatel
Nadica Miljkovic
Ass. Professor (SRB)
School of Electrical Engineering, University of Belgrade
Ildiko Czeller
Data Scientist (HUN)
Emarsys
Sandro Radovanovic
Teaching Assistant (SRB)
Faculty of Organisational Sciences, University of Belgrade
Cervan Girard
Consultant (FRA)
ThinkR
Viktor Tisza
Data Scientist Team Lead (ITA)
Generali Group
Jelena Jovanovic
Researcher (SRB)
University of Belgrade
Lubomir Stepanek
Biostatician (CZE)
Faculty of Biomedical Engineering, Czech Technical University
Andjela Todorovic
Research Associate (SRB)
Faculty of Sciences and Mathematics, University of Nis
Tamas Nagy
Assistant Professor (HUN)
Eötvös Loránd University
Steph Locke
Principal Consultant (GBR)
Locke Data
Marko Galjak
Data Scientist (SRB)
Catalyst Balkans
Judit Mokos
Biologist (HUN)
Hungarian Academy of Sciences

Workshops

Workshops will be held on October 26th 2018 in Startit Center Belgrade.

Time Subject Lecturer
09:00 - 12:30 An Introduction to Text Mining in R Jelena Jovanovic, University of Belgrade
14:00 - 17:30 Data Visualization in R Judit Mokos, Hungarian Academy of Sciences

An Introduction to Text Mining in R

This workshop will introduce participants to Text Mining (TM) methods and techniques, and enable them to develop working knowledge of TM in R. During the workshop, we will go through the overall TM process and examine each of its key phases. In particular, we will start with text preprocessing, then move to the transformation of unstructured textual content to a structured numerical format (i.e., feature creation and selection), thus obtaining a feature set that can serve as the input to a statistical, or machine learning, or a graph-based algorithm for pattern mining or information extraction; the next step will be to run the features through a ML algorithm, e.g. a classification algorithm, considering that text classification is probably the most representative TM task; finally, we will examine and evaluate the obtained results. All the steps will be done with the state-of-the-practice R packages, such as tidyr, dplyr, stringr, quanteda, and caret.

Workshop participants should have at least a basic level of programming experience in R. No experience with any form of text mining / text analytics is required.

Speak visually! - Data visualization in R using ggplot2

Figures are your friends: they help you to understand the main pattern of your data or show the results in a quick way. A good figure can grab your attention and your audience will remember for your work easily. But, how to make appealing and easy to understand figures?

In this workshop, we will take you step-by-step through the process of graph/figure making, including the most common types of graphs, when to use them, and how to interpret them. We will start using the basic tools provided by R base, to then switch to the package ggplot2. We will also discuss how to recognize misleading graphs, and why not to use them.

Programme

Start End
09:00 10:35
Mariachiara Fortuna: The full automation - Project and code design for a massive reporting system with R
Cervan Girard: Shiny Application - from package development to server deployment
Marko Galjak: Using R for Social Network Analysis of Philanthropy - Leveraging Relational Data for Smarter Giving
Peter Laurinec: Time Series Data Mining - from PhD to Startup
10:35 11:05 Coffee Break
11:05 12:40
Dragana Radojicic: Machine Learning in Finance
Viktor Tisza: R for cross-sell modeling
Marcin Kosinski: Multi-state churn analysis, with the subscription based product
Lubomir Stepanek: Machine-learning and R in plastic surgery - Classification and attractiveness of facial emotions
12:40 14:10 Lunch
14:10 15:45
Nadica Miljkovic: Digital biosignal processing with R
Sandro Radovanovic: White Box Clustering in R
Tamas Nagy: Meta-analysis data management with the {metamanager} package
Ildiko Czeller: The essentials to work with object-oriented systems in R
15:45 16:15 Coffee Break
16:15 17:50
Andjela Todorovic: Markov chain simulation in R
Steph Locke: SQL Server and R for real-time predictions
Filip Rodik: ETL with R
Radmila Velickovic: Potentials of R for data linkage
17:50 18:00 Official Closing

Nadica Miljkovic: Digital biosignal processing with R

Though R gained high popularity in recent years in areas of data science and bioinformatcs, the applicaton of R programming for digital signal processing is not widespread to large extent. “Signal” is the most popular CRAN package for signal processing in R. This package contains functons for fltering resampling, interpolaton and other routnes based on traditonal Matlab and GNU Octave functonality.
Besides “signal”, other useful basic signal processing packages (e.g.s “wavelets”) and specialized packages for digital biosignal processing (e.g.s heart rate variability analysis – “RHRV” package processing of event-related brain potentals – “erpR”, analysis and evaluaton of electroencephalography data – “eegkit”, routnes for processing and modeling of electromyography signals – “biosignalEMG”) provide solid foundaton for R applicaton in digital biosignal processing.
In this talk, besides brief overview of R for biosignal processings we will demonstrate some useful signal processing techniques implemented in R aiming at artfact cancellaton in biosignals.

Radmila Velickovic: Potentials of R for data linkage

In the sea of data, matching different datasets and extracting value can be a challenge. I will present available resources in R for deterministic and probabilistic data linkage. Both methods ill be supported with examples.

Sandro Radovanovic: White Box Clustering in R

Most often we use data mining and machine learning algorithms as black-box, where details are hidden, and the user can optionally play with parameters. I would like to encourage people to use white-box algorithms. In white box algorithms, approach algorithms are disassembled to components which allows deeper understanding and extensibility of an algorithm. This way one can change or add a new component to be better suited for the data at hand.
In this talk, WhiBo Clustering package (soon) will be presented with the complete white-box representative based clustering algorithm structure and examples.

Filip Rodik: ETL with R

The standard dilemma when dealing with Extract-Transform-Load tasks is SQL VS graphical piping tools. How does R fit into that? Can time be saved by using R for tedious data wrangling on smaller projects?

Marcin Kosinski: Multi-state churn analysis, with the subscription based product

Subscriptions are no longer just for newspapers. The consumer product landscape, particularly among e-commerce firms, includes a bevy of subscription-based business models. Internet and mobile phone subscriptions are now commonplace and joining the ranks are dietary supplements, meals, clothing, cosmetics and personal grooming products.
Standard metrics to diagnose a healthy consumer-brand relationship typically include customer purchase frequency and ultimately, retention of the customer demonstrated by regular purchases. If a brand notices that a customer isn’t purchasing, it may consider targeting the customer with discount offers or deploying a tailored messaging campaign in the hope that the customer will return and not “churn”.
The churn diagnosis, however, becomes more complicated for subscription-based products, many of which offer multiple delivery frequencies and the ability to pause a subscription. Brands with subscription-based products need to have some reliable measure of churn propensity so they can further isolate the factors that lead to churn and preemptively identify at-risk customers. During the presentation I’ll show how to analyze churn propensity for products with multiple states, such as different subscription cadences or a paused subscription. If the time allows I’ll also present useful plots that provide deep insights during such modeling, that we have developed at Gradient Metrics - a quantitative marketing agency.

Cervan Girard: Shiny Application - from package development to server deployment

To facilitate our work on shiny applications, we designed a Shiny template included in a R package. Development within a package framework allows for all best practices (vignettes, documentation, tests, etc.) and easier maintenance.
We will present our tricks and practices to save time in the development of Shiny applications using our {shinytemplate} package. Then, we will show how to deploy the application on a broad scale with Shinyproxy.

Peter Laurinec: Time Series Data Mining - from PhD to Startup

The talk will be oriented on differences between "doing" a research and an application of time series data mining to real problems in business on a real rich data.
I will discuss, why research and business need to be related and also not. Typical tasks of time series data mining in energetics with use cases in R will be shown.

Dragana Radojicic: Machine Learning in Finance

Nowadays automatic trading agents are inseparable parts of many businesses found around the world. Quantitative tools are widely adopted by hedge funds, investment banks and other financial institutions. Stock markets are producing huge amount of data. In order to keep up with the pace also the technology stack of research institutions needs to adapt. However, the portion of data is not the only reason. Generally speaking, in machine learning we are able to find hid patterns within data. In order to develop trading strategies, describe the behavior present in the market, one can grasp the concepts of supervised and unsupervised learning. This research is based on real market data from the past, more precisely on the data set from Nasdaq Stock Market (second largest exchange in the world). It is possible to match similar points together via unsupervised learning. Furthermore, it is possible to label elements (assign them to a group) via supervised learning concepts (e.g. classification).

Lubomir Stepanek: Machine-learning and R in plastic surgery - Classification and attractiveness of facial emotions

Plenty of current studies conclude that human facial attractiveness perception is data-based and irrespective of the perceiver. However, the ways how to analyse associations between facial geometric image data and its visual impact always exceeded the power of classical statistical methods. What is more, current plastic surgery deals with aesthetic indications such as an improvement of the attractiveness of a smile or other facial emotions, therefore it should take into consideration the fact that total face impression is also dependent on presently expressed facial emotion.
In this work, we have applied machine-learning methods and a power of R language (and some of its packages) to explore how accurate classification of photographed faces into sets of facial emotions and their facial manifestations is, and – furthermore – which facial emotions are associated with higher level of facial attractiveness, measured using Likert scale by a board of independent observers.
Both profile and portrait facial image data were collected for each of a patient (exposed to an emotion incentive), then processed, landmarked and analysed using R language. The sets of used facial emotions and other facial manifestation originate from Ekman-Friesen FACS scale but were improved substantially. Bayesian naive classifiers using e1071 package, decision trees (CART) via tree and rpart packages and, finally, neural networks by neural net package were learned to allow assigning a new face image data into one of the facial emotions.
Neural networks manifested the highest predictive accuracy of a new face categorization into facial emotions. The geometrical shape of a mouth, then eyebrows and finally eyes affect in descending order an intensity of a classified emotion, as was identified using decision trees. The mentioned R packages proved their maturity.
We performed machine-learning analyses to compare which one of classification methods, implemented via R packages, conducts the best prediction accuracy when classifying face images into facial emotions, and – additionally – to point out which facial emotions and geometric features, based on large data evidence, affect facial attractiveness the most, and therefore should preferentially be addressed within plastic surgery procedures.

Andjela Todorovic: Markov chain simulation in R

The markovchain package in R is quite an effective tool in creating and analyzing Discrete-Time Markov Chains. In this speech, I will briefly review the underlying theory of Markov chains and it's structural properties. Afterward, I will provide several real-world examples of Markov chains, their implementation in R, and show how to create and manipulate its objects and analyze results.

Tamas Nagy: Meta-analysis data management with the {metamanager} package

In the social and medical sciences, researchers often use meta-analysis to aggregate findings from several studies. However, conducting a meta-analysis is a time consuming enterprise, that requires not just domain specific knowledge and analytical experience, but considerable data management skills as well. To aid reproducible research, it should be possible to handle tasks - from collecting to analyzing data - directly in R. Even though there are several useful packages to conduct the statistical part of a meta-analysis, there is a lack of packages that deal with the data management tasks that are typically needed. To fill this gap, we created the {metamanager} package. The package provides several functions for conducting reproducible meta-analysis, while the code remains human readable. Key functionality involves merging and tidying article metadata, flagging duplicates, creating files for human coding, assessing coding performance, detecting and correcting human errors, etc. The package has functions to manage spreadsheets through Google Drive, providing a front-end for manual data entry, access management, version control, and collaborative editing.

Viktor Tisza: R for cross-sell modeling

A showcase how R supported cross-sell modeling at Generali. Introduction of the challenges and possible solutions. Meanwhile discovering some useful and fun packages like MLR and Packrat.

Steph Locke: SQL Server and R for real-time predictions

Embedding your R (and soon Python!) models in SQL Server enables you to add predictive capabilities to your applications and your analytics without adding expensive components or going outside your network via expensive API calls.
In this demo-packed talk, you’ll see how you can go from a model built in R to making predictions on the fly in MS SQL Server 2016.

Marko Galjak: Using R for Social Network Analysis of Philanthropy - Leveraging Relational Data for Smarter Giving

Using graph theory for solving problems isn’t new. However, increasing amount of data available offers an ever-growing number of opportunities for abstracting the data through graphs. The best-known example of graph abstractions are probably social interactions, however, graph theory can be used to abstract many other concepts. The graph theory is widely used in many disciplines: genomics, business administration, urban planning, environmental studies, social sciences. It has already been widely adopted by security services. Apart from insights network analyses can provide, one of the main reason for its application across so many disciplines is its robustness and scalability which allows for the calculations and clustering to be performed very efficiently even on vast networks.
Catalyst Balkans is a nonprofit intermediary support organization with a mission to broaden the domestic philanthropy ecosystem in the Western Balkans. Over the past three years, we've been collecting data on philanthropy in the Western Balkans. Our database contains data on more than 30.000 instances of donation classified by a plethora of categories. In our givingbalkans.com application for exploring these data, we built a tool called CiviGraph for the analysis of the relational aspect of our data. In our abstraction donors and beneficiaries are represented by nodes, and the instances of donation are represented as links between them. This abstraction allows for calculating various metrics which can be used in obtaining invaluable intelligence. Further, the visual representation of various neighborhoods formed by donors and beneficiaries can be used to explore the philanthropy landscape in the Western Balkans.

Venue

Workshop day will be held on October 26th 2018 in Startit Center Belgrade, Savska street 5.

Close to the city center, the Main Building of Museum of Science and Technology will host the conference on October 27th.

Travel

Important Dates

Event Date
Registration start July 31st 2018
Registration end October 21st 2018
Call for Papers start July 17th 2018
Call for Papers end September 2nd 2018

Code of Conduct

satRday is dedicated to providing a harassment-free and inclusive conference experience for all in attendance regardless of, but not limited to, gender, sexual orientation, disabilities, physical attributes, age, ethnicity, social standing, religion or political affiliation.

We do not tolerate harassment of participants (including organisers and vendors) in any form. Sexual innuendos and imagery are not appropriate for any conference venue, including presentations.

Anyone violating these rules may be given warning or expelled from the conference (without a refund) at the discretion of the conference organisers.

Our code of conduct/anti-harassment policy can be found here.