Blog entries

Falling Man

Prophet vs BSTS

179 300 Sisifo's Page prophet vs bsts, time series prediction, MineThatData forecasting challenge 3 January 2021

Can a model like prophet, far simpler to fit compared to BSTS, have similar results in accuracy? Let's use the winning solution of MineThatData Forecasting Challenge, with BSTS, as a scenario for comparison.

Read More →
Work In Progress

Work in progress - new posts

179 300 Sisifo's Page pulsioximeter prediction, deep learning time series, data augmentation time series, desaturation alarm prediction, pulsioximeter deep learning, deep learning time series data augmentation Visualization of Association Rules, R code, netCoin R package, frequent itemsets, visualization frequent itemsets, apriori, interpretation a priori 4 Jan 2020

Under reconstruction! The aspect of this site may look broken for some browsers/resolutions.

In the meantime, find below links to the most recent posts:

Falling Man

Uplift modeling: what if the split between treatment and control is not random

179 300 Sisifo's Page Uplift modeling, not random split treatment and control groups, problems with uplift, negative qini, negative gain, bad uplift classifier 12 Sep 2018

Uplift modeling does not work well when the split between targeted and not targeted customers is not done at random. The problem was raised by a reader of a previous post; reproduced now with a public dataset. Symptoms: Qini area is negative, but otherwise the diagnosis for the classifier is ok. It includes code in R and a dataset, so that it may be used as a tutorial.

Read More →
EKG Realistic Heart

Clustering time series using DTW: pulsioximeter data

82 288 Sisifo's Page apply clustering to time series using Dynamic Time Warping, R code, Dynamic Time Warping for pulsioximeter data, DTW, pulsioximeter dataset 5 Nov 2017

Using now data from a pulsioximeter (instead of the accelerometer of previous posts), anyhow generating a big amount time series -the device is switched on every night with a patient, and for comparison a few nights from a healthier person are added. As a step of an initial exploratory analysis of the set of time series, the objective is to perform a clustering of a dataset of time series, using Dynamic Time Warping as the distance measure. It includes code in R and a dataset, so that it may be used as a tutorial.

Read More →
Neural network

Applying neural nets to time series: accelerometer data from wearable

171 200 Sisifo's Page apply neural network to time series, R code, detecting physical activity levels, accelerator data, wearables, neural net 29 Jan 2016

When approaching a somehow more advanced analysis of the data coming from an accelerometer, the far-from-obvious concepts from time series start to be needed. But are they really needed? Maybe not, since one could always apply a neural network for solving the problem. Is this second path easier in practice? It includes code in R and a dataset, so that it may be used as a tutorial.

Read More →
Spinning a top

Wearables: measuring physical activity from accelerometer data

195 200 Sisifo's Page wearables, analysis, R code, detecting physical activity levels, acceleration signal vector magnitude 6 Nov 2016

The very first analysis to try for the data coming from an accelerometer: measuring the physical activity of the user. Actually, the simplest statictic measures already tell a lot -in particular the standard deviation of the Signal Vector Magnitude (i.e. the modulus). It includes code in R and a dataset, so that it may be used as a tutorial.

Read More →
Strong Man Control

Upstreaming data from a wearable device to a server: making a robust Android app for the lab

156 300 Sisifo's Page getting data from a wearable upstream, Android app, activity and service, Android Processes and Application Life Cycle 1 Nov 2016

Still setting up a lab for a number of wearables, in which all the raw data from sensors (e.g. accelerometer) goes upstream to a server, to be analyzed later... In this post, some conclusions on how to make the app robust enough, so that the users can do whatever they like with their smartphones during the experiment.

Read More →
Boiling point of water

Upstreaming data from a wearable device to a server: a Android app for a data lab

200 153 Sisifo's Page getting data from a wearable upstream, Android app, data lab, accelerometer data analysis 31 Oct 2016

Setting up a lab for a number of wearables, in which all the raw data from sensors (e.g. accelerometer) goes upstream to a server, to be analyzed later. Some features are not so common in an Android app implementation, e.g. a not negligible upstream of data. Includes detailed explanations and Java code.

Read More →
User

Google Analytics: how to figure out sessions when a custom user id is not set (follow up on Association Rules)

200 153 Sisifo's Page figure out sessions of Google Analytics, clickstream, Association Rules, custom user id 12 Jun 2016

How to identify sessions in records from Google Analytics, when your customer has not set the custom user id dimension (follow up on Association Rules post). Also, identify the sessions of the users in other cases with GA data, e.g. when associating events to sessions. Plus the code in R.

Read More →
Petroglyph

Association Rules applied to Google Analytics data for insights on the structure of a web site

200 175 Sisifo's Page apriori for clikstream data, structure of the site as seen by the users, collaborative filtering 16 May 2016

How to mine Association Rules on clickstream data from Google Analytics, as a way to understand how users visit your web site. Includes details on how to configure GA and how to extract the data into R, plus the code in R for mining the rules. Seed for a recommendation system in your web site.

Read More →
Perseus

Cost/benefit evaluation of Uplift Modeling with an example in R

200 143 Sisifo's Page scenarios are relevant for uplift modeling, evaluate them from a business perspective 24 September 2016

Is there a way to distinguish the customers that respond (or not) to a marketing action when targeted, from the customers that respond (or not) when they are not targeted? Which scenarios are relevant for uplift modeling, compared to a “traditional” model? How to evaluate them from a business perspective? How to fit the uplift model in R?

Read More →