Featured Article

Life-Cycle of Data

This article talks about the different phases of the data life cycle which includes data generation to data interpretation.

Continue reading...

Life-Cycle of Data

June 09

This article talks about the different phases of the data life cycle which includes....

Life-cycle Data

Machine Learning in brief

June 01

Machine learning is the science of programming computers so they can learn from data....

Placeholder ML

Latest Blog

Life-Cycle of Data

Data is like the gold that every miner is digging for in the mine and this gold gets carved into a fine piece of jewellery by an expert goldsmith, which ultimately gets cherished for the brilliance, elegance and beauty of that masterpiece. Here, miner and goldsmith are synonymous for the Data Scientist who chisels the data and construct meaningful insights from it

Data Life-Cycle (Image by author)

Extracting value from data is the process of interpretation by the end-user and its study is referred as Data Science. So, the simple life cycle of data is as follows: Generation, Collection, Processing, Storage, Management, Analysis, Visualization & Interpretation.

Generation: The initiation of data life-cycle starts with its generation. Data is generated by various sources and the most common of it are people. Various internet activities by different people give birth to enormous amounts of data which is consumed by several big-tech companies & retail giants. Internet-of-Things also contribute to a huge chunk in data generation with the help of different sensors which collect information from the surroundings. All these kinds of data generate unfathomable amounts of data which accounts from few 100 gigabytes to several petabytes.

Collection: Data collection never affirms that whole of the data generated will definitely be collected. The prime reasons for this inefficiency may include practical shortcomings like irrelevancy and streaming speed of data which can be way faster than its processing.

Processing: Data processing entails cleaning of data, organizing data and transforming raw data into usable format for getting it ready for further processes. Encryption of data also comes under the processing phase for secure data storage.

Storage: In present terms, storage of data is carried out in magnetic tapes called hard disks & solid-state silicon chips which may store data up to few terabytes. Future technologies like optical drives and DNA storage devices will be the de facto novel storage solutions with enormous storing capabilities in the peta or exabytes.

Management: Relation databases have developed a lot in efficiently managing text based data but have limited operability for different forms & types of data like audio, video, image or unstructured, structured data which pose unsuitability for such systems. Creation & Usage of meta-data can play a key role in accessibility & modification of these diversified forms of data.

Analysis: Data analysis is considered the heart of Data Science but doesn’t comprise the whole profile just like heart itself. Different techniques and methods like data wrangling, machine learning and statistical inferencing to get valuable insights or predict some outcomes constitute data science.

Visualization: Humans are not very good at comprehending & processing humongous amounts of data points just by looking at it, so data visualization helps in getting the clear picture of a thousand petabytes data in a lucid manner. It is an effective way to convey the results of data analysis in a visual format.

Interpretation: Data interpretation is attributed to explanation of the visual aspects of data after visualization. These include its context, important considerations and practicable implications. In short, it requires a person to weave a story around the data outcomes and fabricate a feasible solution or a prediction.

Human: In the last comes a human who is the actual consumer of this data, which can be a financier requiring data science to effectively manage clients money, businessman who is in dire need to efficiently manage the business and serve customer better or a scientist who through data makes a new discovery. All in all data and its science is the present day gold, which every individual or institution is betting on for reaping the benefit out of it.

This article is an adaption of a post on The Data Life Cycle by Jeannette M. Wing about the different phases of the data life cycle in a data science project which starts with data generation and ends all the way to data interpretation.

Machine Learning in brief

In general term, Machine learning is the science of programming computers so they can learn from data.

While a more generalized definition of Machine Learning is given by Arthur Samuel in 1959 is the field of study that gives computers the ability to learn without being explicitly programmed.

Machine learning algorithms can be categorized into: Supervised, Unsupervised, Semisupervised and Reinforcement Learning.

Supervised Learning

In supervised learning, machine learning algo is fed with desirable solutions(labels) along with the dataset for training. This type of machine learning process is further classified into following algorithms:

  • Linear Regression
  • Logistic Regression
  • K-Nearest neighbors (KNN)
  • Support Vector Machines (SVM)
  • Decision Trees & Random Forests

Unsupervised Learning

In unsupervised learning, there is no such labels/solutions present in the dataset for training. The system tries to figure it the solution by itself. This type of machine learning process can be further classified into following algorithms:

  • K-Means Clustering
  • Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
  • Hierarchial Cluster Analysis (HCA)
  • One-class SVM
  • Isolation Forest

sinxfactor

My personal blogging site based on Bootstrap5. This blog primarily focuses on Machine Learning & Python.

πŸ“ Relevent Articles

  1. Life-Cycle of Data
  2. Machine Learning in brief

🌐 Social Connect

  1. GitHub
  2. Kaggle