Programme
Start |
End |
|
09:00 |
10:35 |
|
|
|
Mariachiara Fortuna: The full automation - Project and code design for a massive reporting system with R |
|
|
Cervan Girard: Shiny Application - from package development to server deployment |
|
|
Marko Galjak: Using R for Social Network Analysis of Philanthropy - Leveraging Relational Data for Smarter Giving |
|
|
Peter Laurinec: Time Series Data Mining - from PhD to Startup |
10:35 |
11:05 |
Coffee Break |
11:05 |
12:40 |
|
|
|
Dragana Radojicic: Machine Learning in Finance |
|
|
Viktor Tisza: R for cross-sell modeling |
|
|
Marcin Kosinski: Multi-state churn analysis, with the subscription based product |
|
|
Lubomir Stepanek: Machine-learning and R in plastic surgery - Classification and attractiveness of facial emotions |
12:40 |
14:10 |
Lunch |
14:10 |
15:45 |
|
|
|
Nadica Miljkovic: Digital biosignal processing with R |
|
|
Sandro Radovanovic: White Box Clustering in R |
|
|
Tamas Nagy: Meta-analysis data management with the {metamanager} package |
|
|
Ildiko Czeller: The essentials to work with object-oriented systems in R |
15:45 |
16:15 |
Coffee Break |
16:15 |
17:50 |
|
|
|
Andjela Todorovic: Markov chain simulation in R |
|
|
Steph Locke: SQL Server and R for real-time predictions |
|
|
Filip Rodik: ETL with R |
|
|
Radmila Velickovic: Potentials of R for data linkage |
17:50 |
18:00 |
Official Closing |
Nadica Miljkovic: Digital biosignal processing with R
Though R gained high popularity in recent years in areas of data science and bioinformatcs, the applicaton of R programming for digital signal processing is not widespread to large extent. “Signal” is the most popular CRAN package for signal processing in R. This package contains functons for fltering resampling, interpolaton and other routnes based on traditonal Matlab and GNU Octave functonality.
Besides “signal”, other useful basic signal processing packages (e.g.s “wavelets”) and specialized packages for digital biosignal processing (e.g.s heart rate variability analysis – “RHRV” package processing of event-related brain potentals – “erpR”, analysis and evaluaton of electroencephalography data – “eegkit”, routnes for processing and modeling of electromyography signals – “biosignalEMG”) provide solid foundaton for R applicaton in digital biosignal processing.
In this talk, besides brief overview of R for biosignal processings we will demonstrate some useful signal processing techniques implemented in R aiming at artfact cancellaton in biosignals.
Radmila Velickovic: Potentials of R for data linkage
In the sea of data, matching different datasets and extracting value can be a challenge. I will present available resources in R for deterministic and probabilistic data linkage. Both methods ill be supported with examples.
Sandro Radovanovic: White Box Clustering in R
Most often we use data mining and machine learning algorithms as black-box, where details are hidden, and the user can optionally play with parameters. I would like to encourage people to use white-box algorithms. In white box algorithms, approach algorithms are disassembled to components which allows deeper understanding and extensibility of an algorithm. This way one can change or add a new component to be better suited for the data at hand.
In this talk, WhiBo Clustering package (soon) will be presented with the complete white-box representative based clustering algorithm structure and examples.
Filip Rodik: ETL with R
The standard dilemma when dealing with Extract-Transform-Load tasks is SQL VS graphical piping tools. How does R fit into that? Can time be saved by using R for tedious data wrangling on smaller projects?
Marcin Kosinski: Multi-state churn analysis, with the subscription based product
Subscriptions are no longer just for newspapers. The consumer product landscape, particularly among e-commerce firms, includes a bevy of subscription-based business models. Internet and mobile phone subscriptions are now commonplace and joining the ranks are dietary supplements, meals, clothing, cosmetics and personal grooming products.
Standard metrics to diagnose a healthy consumer-brand relationship typically include customer purchase frequency and ultimately, retention of the customer demonstrated by regular purchases. If a brand notices that a customer isn’t purchasing, it may consider targeting the customer with discount offers or deploying a tailored messaging campaign in the hope that the customer will return and not “churn”.
The churn diagnosis, however, becomes more complicated for subscription-based products, many of which offer multiple delivery frequencies and the ability to pause a subscription. Brands with subscription-based products need to have some reliable measure of churn propensity so they can further isolate the factors that lead to churn and preemptively identify at-risk customers.
During the presentation I’ll show how to analyze churn propensity for products with multiple states, such as different subscription cadences or a paused subscription. If the time allows I’ll also present useful plots that provide deep insights during such modeling, that we have developed at
Gradient Metrics - a quantitative marketing agency.
Cervan Girard: Shiny Application - from package development to server deployment
To facilitate our work on shiny applications, we designed a Shiny template included in a R package. Development within a package framework allows for all best practices (vignettes, documentation, tests, etc.) and easier maintenance.
We will present our tricks and practices to save time in the development of Shiny applications using our {shinytemplate} package. Then, we will show how to deploy the application on a broad scale with Shinyproxy.
Peter Laurinec: Time Series Data Mining - from PhD to Startup
The talk will be oriented on differences between "doing" a research and an application of time series data mining to real problems in business on a real rich data.
I will discuss, why research and business need to be related and also not. Typical tasks of time series data mining in energetics with use cases in R will be shown.
Dragana Radojicic: Machine Learning in Finance
Nowadays automatic trading agents are inseparable parts of many businesses found around the world. Quantitative tools are widely adopted by hedge funds, investment banks and other financial institutions. Stock markets are producing huge amount of data. In order to keep up with the pace also the technology stack of research institutions needs to adapt. However, the portion of data is not the only reason. Generally speaking, in machine learning we are able to find hid patterns within data. In order to develop trading strategies, describe the behavior present in the market, one can grasp the concepts of supervised and unsupervised learning. This research is based on real market data from the past, more precisely on the data set from Nasdaq Stock Market (second largest exchange in the world). It is possible to match similar points together via unsupervised learning. Furthermore, it is possible to label elements (assign them to a group) via supervised learning concepts (e.g. classification).
Lubomir Stepanek: Machine-learning and R in plastic surgery - Classification and attractiveness of facial emotions
Plenty of current studies conclude that human facial attractiveness perception is data-based and irrespective of the perceiver. However, the ways how to analyse associations between facial geometric image data and its visual impact always exceeded the power of classical statistical methods. What is more, current plastic surgery deals with aesthetic indications such as an improvement of the attractiveness of a smile or other facial emotions, therefore it should take into consideration the fact that total face impression is also dependent on presently expressed facial emotion.
In this work, we have applied machine-learning methods and a power of R language (and some of its packages) to explore how accurate classification of photographed faces into sets of facial emotions and their facial manifestations is, and – furthermore – which facial emotions are associated with higher level of facial attractiveness, measured using Likert scale by a board of independent observers.
Both profile and portrait facial image data were collected for each of a patient (exposed to an emotion incentive), then processed, landmarked and analysed using R language. The sets of used facial emotions and other facial manifestation originate from Ekman-Friesen FACS scale but were improved substantially. Bayesian naive classifiers using e1071 package, decision trees (CART) via tree and rpart packages and, finally, neural networks by neural net package were learned to allow assigning a new face image data into one of the facial emotions.
Neural networks manifested the highest predictive accuracy of a new face categorization into facial emotions. The geometrical shape of a mouth, then eyebrows and finally eyes affect in descending order an intensity of a classified emotion, as was identified using decision trees. The mentioned R packages proved their maturity.
We performed machine-learning analyses to compare which one of classification methods, implemented via R packages, conducts the best prediction accuracy when classifying face images into facial emotions, and – additionally – to point out which facial emotions and geometric features, based on large data evidence, affect facial attractiveness the most, and therefore should preferentially be addressed within plastic surgery procedures.
Andjela Todorovic: Markov chain simulation in R
The markovchain package in R is quite an effective tool in creating and analyzing Discrete-Time Markov Chains. In this speech, I will briefly review the underlying theory of Markov chains and it's structural properties. Afterward, I will provide several real-world examples of Markov chains, their implementation in R, and show how to create and manipulate its objects and analyze results.
Tamas Nagy: Meta-analysis data management with the {metamanager} package
In the social and medical sciences, researchers often use meta-analysis to aggregate findings from several studies. However, conducting a meta-analysis is a time consuming enterprise, that requires not just domain specific knowledge and analytical experience, but considerable data management skills as well. To aid reproducible research, it should be possible to handle tasks - from collecting to analyzing data - directly in R. Even though there are several useful packages to conduct the statistical part of a meta-analysis, there is a lack of packages that deal with the data management tasks that are typically needed. To fill this gap, we created the {metamanager} package. The package provides several functions for conducting reproducible meta-analysis, while the code remains human readable. Key functionality involves merging and tidying article metadata, flagging duplicates, creating files for human coding, assessing coding performance, detecting and correcting human errors, etc. The package has functions to manage spreadsheets through Google Drive, providing a front-end for manual data entry, access management, version control, and collaborative editing.
Viktor Tisza: R for cross-sell modeling
A showcase how R supported cross-sell modeling at Generali. Introduction of the challenges and possible solutions. Meanwhile discovering some useful and fun packages like MLR and Packrat.
Steph Locke: SQL Server and R for real-time predictions
Embedding your R (and soon Python!) models in SQL Server enables you to add predictive capabilities to your applications and your analytics without adding expensive components or going outside your network via expensive API calls.
In this demo-packed talk, you’ll see how you can go from a model built in R to making predictions on the fly in MS SQL Server 2016.
Marko Galjak: Using R for Social Network Analysis of Philanthropy - Leveraging Relational Data for Smarter Giving
Using graph theory for solving problems isn’t new. However, increasing amount of data available offers an ever-growing number of opportunities for abstracting the data through graphs. The best-known example of graph abstractions are probably social interactions, however, graph theory can be used to abstract many other concepts. The graph theory is widely used in many disciplines: genomics, business administration, urban planning, environmental studies, social sciences. It has already been widely adopted by security services. Apart from insights network analyses can provide, one of the main reason for its application across so many disciplines is its robustness and scalability which allows for the calculations and clustering to be performed very efficiently even on vast networks.
Catalyst Balkans is a nonprofit intermediary support organization with a mission to broaden the domestic philanthropy ecosystem in the Western Balkans. Over the past three years, we've been collecting data on philanthropy in the Western Balkans. Our database contains data on more than 30.000 instances of donation classified by a plethora of categories. In our givingbalkans.com application for exploring these data, we built a tool called CiviGraph for the analysis of the relational aspect of our data. In our abstraction donors and beneficiaries are represented by nodes, and the instances of donation are represented as links between them. This abstraction allows for calculating various metrics which can be used in obtaining invaluable intelligence. Further, the visual representation of various neighborhoods formed by donors and beneficiaries can be used to explore the philanthropy landscape in the Western Balkans.