Photo by Finn Mund on Unsplash

Without a sophisticated program, machines would not have the capability to process human languages like our brains do. However, computers are extremely powerful at processing numbers and things with mathematical structure. Therefore, in a Natural Language Processing pipeline, it is important to transform raw text into a numerical representation so it could be ingested by a model down the pipeline.

Just like any machine learning tasks, it’s important to first clean and normalize the raw texts before converting them to numerical values. And there are some relevant terms in to be aware of:

  • Corpus: a corpus can be a single…

https://www.kdnuggets.com/wp-content/uploads/xkcd-p-value-jellybeans.jpg

Statistical Significance is one of the most important concepts in statistics. It’s being used widely in all sorts of scientific publications and is the fundamental building block for many common statistical tests such as ANOVA testing. People often associate Statistically Significance with ‘p-value’ and always pair it with numbers such as 0.1, 0.05, etc. It’s pretty straightforward to apply Statistical Significance on paper since it’s just comparing numbers. However, I still found its concept hard to comprehend. For example, what does it really mean when something is said to be statistically significant? …


Photo by NASA on Unsplash

Stats had always been a subject that I found so abstract to comprehend. However, it is one of the most import building blocks of Data Science and Machine Learning. With the increasingly accessible computing power and availability of advanced packages for statistical analysis, one could probably perform a data analysis and build a model fairly quickly without having a thorough understanding of the key statistical concepts. But I would argue that it will likely not be a good one. …


So far I talked about my motivation to becoming a Data Scientist, as well as my studying plan. In this final part, I will cover my reasons for doing another master degree.

If you have been following my last two posts in this series, you might think that everything was looking great with this self-guided experience, so why school all of a sudden? Well, it sure had been a fruitful journey for me and I could confidently say that I was more knowledgeable than a non-technical person, but it simply wasn’t enough.

Problems with Self-directed Learning

It was already December of 2019 and I…


Why the Career Change

Previously, I talked about my experience with data-driven techs. As an engineer without any machine learning knowledge, these experiences made me realize how imperative data is. More importantly, it gave me an opportunity to re-evaluate my career plan at the time and the medical device industry in general. This ultimately led me to where I am right now:

I had a love and hate relationship with the medical device industry. First of all, I spent my undergrad and dedicated an entire graduate degree in this field. If it wasn’t passion, I don’t know what else it could be. Additionally, a…


About me

Hi everyone! My name is Michael, my friends often call me Mike or MT. And of course, I’m just another aspiring Data Scientist just like the most of you out there on this platform. But unlike some of you, I was also an ex-biomedical engineer. I am interested in all things data-driven, but particularly in uncovering patterns that have temporal attributes, such as trends and sequences. And because of my prior educational and industry experiences in the healthcare, I’m really passionate about constructing scalable data solutions that are also socially meaningful.

Just a bit about my Medium

At the time of this post, I have been…

Michael Tang

M.S. in Data Science candidate, 2022 @ Duke University | Biomedical Engineer | Workout Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store