Used the first 10 mins data from approximately 10K games to predict the results in high-ELO rank games

League of Legends: Blue Team (Left) vs. Red Team (Right) | Image by Author

Introduction

League of Legends is a team-based strategy game where two teams of five powerful champions face off to destroy the other’s base. (https://na.leagueoflegends.com/en-us/how-to-play/)

A typical League of Legends game tends to last 30 to 45 mins, and each game can be divided into three phases: the laning phase, mid game, and late game. Players normally spend the first 10 to 15 minutes to farm in their own lanes (Top, Mid, Bot, JG) to gain early advantages in builds and levels. In the middle game phase, players start to focus on the macro level: push lanes, take down towers, get map…


Plot by author

Introduction

K-means clustering is one of the most popular unsupervised learning methods in machine learning. This algorithm helps identify “k” possible groups (clusters) from “n” elements based on the distance between the elements.

A more detailed explanation would be: the algorithm finds out the distance among each element in your data, then find the number of centroids, allocate the element to the nearest centroids to form clusters, and the ultimate goal is to keep the size of each cluster as small as possible.

K-means can be used in multiple ways. For instance, customer segmentation, insurance fraud detection, document classification, etc.

One…


A walk-through about setup, diagnostic test, evaluation of a linear regression model in R

Photo by Author

Introduction

R is a great free software environment for statistical analysis and graphics. In this blog, I will demonstrate how to do linear regression analysis in R by analyzing correlations between the independent variables and dependent variables, estimating and fitting a model, and evaluating the results' usefulness and effectiveness.

I think R studio's interface (the most commonly used editor for R) is more user friendly than the editors that I have used with Python. …


Nowadays, we have all become so used to talking about deep learning, big data, neural networks… we seem to forget that even though those big topics are prospering, not every business will need them, at least for now. Thus, I want to share a little bit of my experience with common statistic topics. In this and the next blog, I am going to demonstrate how to do a linear regression analysis with Python and R respectively.

Project Summary

This project provided by Kaggle includes a dataset of 79 explanatory variables describing aspects of residential homes in Ames, Iowa. …


Compared porter, snowball, and lancaster stemmers. Integrated the comments into customers’ data to improve the performance of the models.

Image Source

Content Table

Project Summary

This project aimed to study how the unstructured text data could add values to a machine learning classification project.

I had two datasets from a company. One of them contained all the basic information of the customers. The customer dataset also had the target column, which indicated whether the customer was holding the membership or already canceled it. The other dataset contained the comments which the customers left for the company. …


This is a walk-through about how to apply the weighted average ensemble to improve your prediction scores.

Purpose of This Blog

Last week, one of my machine learning class assignments asked us to perform ensemble predictions by combining predictions from the various algorithms. And I found out this topic is fascinating; however, there are not many sources online that can simply explain this concept. So, I decided to write this blog to people who are looking for a straightforward solution with my best efforts.

During my research on this topic, I found there are many different styles and approaches to…


Opinion

The new “dollar tree” business in the data science industry is booming. But it is also very disappointing.

https://unsplash.com/photos/njcCsp58sDc

The impetuous era

“I’m a data scientist now! Check out my certification of data analysis…”

Someone posted this with a picture of certification on one of my social apps a few days ago. It was approximately the 10th time I saw this kind of post in the past three months. I barely know the person who wrote the post, but what I do know is he’s a sports management major, who even had problems dealing with Excel sheets.

It seems so easy now for people to claim themselves, data scientists, or data engineers.

I have to clarify myself here before I go any…


Research

A demonstration of calculating Boilerplate with 30 telecommunication companies’ CSRs.

Image by Dariusz Sankowski from Pixabay

What is Boilerplate and why bother to remove it?

In textual analysis, Boilerplate is a combination of words that can be removed from a sentence without significantly changing the original meaning, such as “more than million in” or “at the end of.” According to Mark Lang and Lorien Stice-Lawrence, Boilerplate has become a particular problematic attribute in annual reports, identified by regulators and standard setters, and their analysis showed that annual report disclosure (from over 15,000 companies) improved as Boilerplate was reduced. The details of the study can be found here.

One of the reasons why Boilerplate affects the quality of annual disclosure is Boilerplate may provide opportunities to…


Research

Analyzing Electronic Health Records (EHR) of ICU patients and developing machine learning models

This analysis is part of a project focusing on analyzing Electronic Health Records (EHR) of ICU patients and developing machine learning models for early prediction of diseases. In this article, we show how to create a network of diseases using EHR records, and generate network embedding using the adjacency matrix or an edge list of the disease network. We use python, R, and Gephi software, and Node2Vec, Networkx, and K-means for analysis. We used Rstudio, Spyder, and Jupyter Notebook as IDE.

Preview of the Dataset

The raw data containing 2,710,672 patient visit records containing 3,933 unique diagnoses. …

Jinhang Jiang

M.S. Business Analytics at ASU W.P. Carey 21’ | Basketball Coach | jinhangjiang.github.io

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store