Categories
Reflection Weekly Meet and Discussion

Weekly Meeting

19 August 2022,

In the weekly meeting, we discussed the progress that each group has made. Most of the groups were still focused on getting to learn all of the prerequisite courses first before get to the project itself, so dr. Yayan, along with dr. Budi and dr. Mulya recommended doing the project along with the prerequisites for time effectiveness. Hearing that, our group agreed on the idea to just get into the project along with the progress that has been made of the prerequisite courses. Aside from it, as usual dr. Yayan also triggered us with the learning issue of artificial intelligence’s application in the medicine field. This week, dr. Yayan triggered us with the usage of machine learning to predict stunting in children, using data trends and analysis. Most of the students agree with it, but dr. Mulya also argues that there are some constraints regarding ethics. This made me wonder: how far can AI strive in the field of medicine? 

Categories
Learning R Project and Assignments Reflection

Project Prerequisites

18 August 2022,

Our group was assigned with the project titled “Reducing Traffic Mortality in The USA” in Datacamp. The project has several data analyzing techniques including data manipulation, data visualization, machine learning, and importing also cleaning data. To understand how these methods work, we would have to undergo some prerequisite courses. 

These prerequisite courses on R are comprised of: 

  • Unsupervised learning in R. This technique has the goal to find patterns in data without trying to make predictions. It also consisted of introductions to clustering and dimensionality reduction (PCA) techniques. 
  • Introduction to the tidyverse. This programming tool (tidyverse) provides us with techniques on data manipulation and visualization, mostly using the tools dplyr and ggplot2. With data manipulation, we will be able to filter, sort, and summarize a dataset. With that information, we will then turn the processed data into line plots, bar plots, histogram, and others using the ggplot2 package. 
  • Intermediate regression in R. In this course, we will be likely to learn about statistical models, especially linear regression. Linear regression serves as a tool to explore relationships between variables in datasets. The tool is provided to understand how interactions between variables can affect predictions. 

 

Then when I take a hint at the project’s task instructions, I summarize the materials through the instructions:

  1. The first step is to read and get the overview of the data. This means that this step requires us to be able to import data and manipulate it, which then results in the process of generating the overview of the data frame. This task requires basic knowledge of tidyverse
  2. The second step is to create a textual and graphical overview of the data. This means that this step requires us to be able to visualize data that has been manipulated (structured) before. To create the visualization of the data, we use the ggplot function of the tidyverse package. This means that the task requires basic knowledge of tidyverse
  3. The third step is to explore correlation between variables in the data frame. This requires us to be able to create linear regression or correlation. 
  4. The fourth step is to make a multivariate linear regression. Multivariate means that it should involve two or more variables, which means that we have to fit all the variables so we can see not only the correlation between two variables, but also with other variables.
  5. The fifth step is to perform PCA on standardized data. PCA is an analysis of linear components of all existing attributes in the data. With this technique, it allows us to visualize variations that present in a dataset. This requires basic knowledge of unsupervised learning in R. 
  6. The sixth step is to visualize the first two principal components. The first two principal component scores that are the results of PCA are extracted from a data frame, then we can visualize it using ggplot. This requires basic knowledge of tidyverse and unsupervised learning in R. 
  7. The seventh step is to make clusters out of similar states in the data. Creating clusters is a way to partition data sets into several groups based on their similarities. KMeans function is used in this step to create clusters. This requires basic knowledge of unsupervised learning in R. 
  8. The eight step is to use KMeans function to visualize clusters in the PCA scatter plot.Visualizing clusters requires basic knowledge of tidyverse and ggplot.  
  9. The ninth step is to visualize feature differences between clusters. Visualizing also requires basic knowledge of tidyverse and ggplot
  10. The tenth step is to compute numbers within each cluster. To compute, we need to do data manipulation in order to get the right numbers. Then, we can visualize it with ggplot. This requires basic knowledge of tidyverse
  11. And the last step is to make a decision out of the results. This is the step that determines which cluster should be a focus for policy intervention of the project. 

It takes time to fully comprehend the concept of the project and the implementation of each prerequisite in this project. 

Categories
Reflection Weekly Meet and Discussion

Discussion and Project Assignment

10 August 2022,

The second weekly meeting was held by dr. Yayan who provided us with discussions regarding the documentary film AlphaGo which triggers the question about the use of AI in the medical field. It was interesting because my friend once asked me the same question about the application of AI in medicine. Hopefully, by applying to this course, I can state my argument after gaining some knowledge about machine learning. Other than that, dr. Yayan also informed us about three groups that were assigned last week. I am excited about this project because I must gain a lot of knowledge and practice about machine learning and data analysis by the end of the course. 

P.S: The discussion also triggers me on how to make use of data science as to reduce incidence of a certain a disease or outbreak. Let’s see what idea or innovation I can make up by the end of the elective program. Wait for it!

Categories
Learning Python Reflection

Introduction to Python

5 August 2022,

Besides R, we also got introduced to Python, the other popular programming language that is known to be more complex than R to learn.

As dr. Mulya said in the class, Python is commonly used for its:

  • Open source-ness
  • Massive and active community
  • Beginner-friendly nature
  • Simple syntax
  • Object-oriented language

Python is also known to be an interpreted language that works with code writing (source code) to control another software application (output). The complexity of python also serves for its applications in web development, software development, games and 3D graphics, business applications, network programming, database access, etc. Some which R cannot do. Applications that

are known to be developed with pythons are Pinterest, Instagram, Spotify, Disqus, Dropbox, Uber, Reddit, Netflix, etc. Some of which we use in our daily lives. 

In one session of this class, we also learned about indentation. Indentation refers to the spaces at the beginning of a code line. In Python, indentation is very important as it makes use of indentation to indicate a block of code. Here is the example: 

Indentation of Python

We also have a live simulation of Python syntax and indentation using the Google Colab (image is attached above), although I still need to learn more about it. Python also has some advantages when compared to R, one of which is data visualization. Python data visualizations, as far as I understand, are quite more readable compared to R. However, I very much prefer R rather than Python for its practicality in analyzing data. 

Categories
Learning R Reflection

Introduction to R

4 August 2022

R was the very first programming language that I have ever encountered in this class. To my understanding, as dr. Budi also explained, R is widely used for most data analysis. Hence, it is a necessity for data science students to learn about it and adapt to it. The more I learn about it, the more I’m aware of the usability of R itself when faced with data and the more I’m interested in how it can be implicated in the world of medicine. 

During the first session of the class, dr. Budi explained about R in general and how it is used precisely in data analysis. As I followed through it, I realized that for me, learning data science with only theories adds the difficulty and complexity to it. I didn’t quite understand well about it until we got to the second session of the class where dr. Budi guided us through the R Studio to learn about it directly on our own devices. To add the function summary, means, and ggplot was what I learned first in R, followed by installing packages in R Studio. That was when I intended to explore R myself using the R studio. 

Luckily, at the end of the class, dr. Budi assigned us a task to explore the R Studio with the data set that he gave us. This way, I can buy my time to learn data analysis using R as much as I can. Let’s see where this would take me to.