Quote for the day ! Mistakes are a fact of life.It is the response to the error that counts.
Select Page

Data Analytics – with a case study

by | Apr 1, 2021 | Uncategorized | 0 comments

Data Analytics is the scientific process of refining raw data in order to get meaningful information.

There are algorithms which can be used to analyse the data in a scientific manner and some mechanical processes also can be used.

The information extracted from the refining process can be used to improve overall efficiency of a business system,  and thereby its overall revenue

What is Data Analytics

Data Analytics is a broad term that refers various types of data analysis. For example manufacturing machines can use these data analyzing methods to sort out the different attributes should be used at which level or quantity  in order to get the maximum outcome.

Content companies use majority of data to keep you clickingwatching content to get another view or click as well.

There are several steps involved in data analytics

  • The primary step determines how the data is to be grouped. They may be grouped according to the agedemographicwealth or gender. Data values can be numerical for easy calculation
  • As second step collect the data for analysis. This can be done through computers, online resources or through direct approach
  • Once the data is collected, it must be organised so that it can be analysed. Organisation may take place on a spreadsheet or other form of software which take statistical data as input.
  • Next process is the cleaning of the raw data before analysis. It is scrubbed and checked to ensure there is no duplication or error  and is complete.

Why Data Analytics become important

Data analytics become important because it helps businesses optimize their performances.

With the help of analysis, a business model can be implemented in companies so that their production cost can be reduced considerably by identifying more efficient ways of doing their day to day activities and by storing large volume of data.

You would get the proper idea why the analytics is important through this link given below


They can also use data analytics to make better business decisions and in helping customer trends and behaviors as and when required, or to improve their product’s quality whenever a new product is launched.

Types of Data Analytics

Data analytics is broken down into four basic types.

  1. Descriptive Data analytics 

Descriptive Data analytics uses historical data to find what happened in an organisation for a specific past time. What about the number of views and sales for a particular period say one month and is usually used to make a comparative study of sales at specific periods.

2.Diagnostic Data analytics

It focuses on some incidents taking place in the organisation rather than their product’s behaviour for a period of time. This involves more diverse inputs and hypothesizing. Ask some questions related to some specific events and try to find the answer is in this category of analysis.  In the discovery process, analysts identify the data sources that will help them interpret the results. Drilling down involves focusing on a certain facet of the data or particular widget.

  1. Predictive Data analytics

Predictive analytics encompasses a variety of statistical techniques from data miningpredictive modelling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events.

In business, predictive models exploit patterns found in historical and transnational data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision-making for candidate transactions.

4.Prescriptive Data analytics

It  suggests a course of action. Prescriptive analytics makes use of machine learning to help businesses decide a course of action based on a computer program’s predictions. Prescriptive analytics works with predictive analytics, which uses data to determine near-term outcomes.

Data analytics implements many quality control systems in the financial world, including the ever-popular Six Sigma program. If you are not properly measuring something—whether it’s your weight or the number of defects per million in a production line(probability) —it is nearly impossible to optimize it.

You would get more information on all these 4 types of data analytics


Use Case

Here Data analytics is used for diagnosis to find the disease by analyzing symptoms in patients


A sample space of 10 patients is given here for the easiness of calculation

Sore Throat Fever Swollen Gland Congestion Headache Diagnosis
P1 yes yes yes yes yes Strep Throat
P2 No No No yes No Allergy
P3 yes yes No yes No Cold
P4 yes No Yes No No Strep Throat
P5 No Yes No Yes No Cold
P6 No No No Yes No Allergy
P7 No No Yes No No Strep Throat
P8 Yes No No yes yes Allergy
P9 No Yes No yes yes Cold
P10 Yes No No yes yes Cold

Here is the formula to calculate information gain

Then we should calculate the Entropy of sample space I(P,n)

Our sample space consists of 10 data for diagnosis (ST (3),A(3),C(4)

Now find the entropy of each attribute. To start first attribute is the Sore Throat. Take the value of yes in sore throat

attribute and count corresponding Allergy and Cold diagnosis against yes value.Repeat the same for No value as well.

Y 2 1 2
N 1 2 2

Now we have to find the entropy (yes) as well as entropy(no) for sore throat attribute using the above formula as shown below.

We get the result as 1.51 as the approximate value for entropy of sore throat (yes) attribute.

Now calculate entropy of sore throat (no) attribute.

Entropy of Sore throat is given by


=0.5*1.52+0.5*1.52=1.52 approx

Then Information gain of Sore throat

I(ST)=E(sample space)-E(attribute)=1.56-1.52=0.05

Using this formula find the Information gain of all attributes

ST 0.05
Fever 0.72
swollen Gland 0.88
congestion 0.45
headache 0.05

Now to draw the decision tree find the root node. The attribute which have highest information gain is the first root node. Here swollen node is selected as root node

In the case of yes value diagnosis is steep throat. For no value more than one attribute plays the role of diagnosis. There fore we have to select the next root node that is fever here because which have next highest gain value.For no value of fever allergy is the diagnosis and for yes value cold is the diagnosis.

In this way we can construct decision trees for any sample space using leaning algorithms


Data analytics is a very fantastic area of study as well as career path for young professionals. Using this decision tree algorithms, we can make useful predictions in various areas in the new world. The jobs in this area promises descent salary when compared to other  professionals.

If you want to study more on data analytics please go through following links

  1. https://www.upgrad.com/data-science-pgd-iiitb/?utm_source=GOOGLE&utm_medium=SEARCH&utm_campaign=DV_DA_PGD_GOOGLE_SEARCH_HighIntent_IND_All&utm_content=Data_Science&utm_term=%2Bdata%20%2Banalytics%20%2Bcourses&gclid=CjwKCAjw3pWDBhB3EiwAV1c5rDxYUhOmX4U7FhgVPrA6VV_eJJKkjQP03dWOAw_ZyMfcz2u6UzOZHhoCxGIQAvD_BwE

2.  https://www.greatlearning.in/great-lakes-pgpdsba-data-analytics?&utm_source=Google&utm_medium=Search&utm_campaign=DSBA_Tier2_Data_Analytics_Search_Course&adgroup_id=114198425277&campaign_id=11765723956&Keyword=data%20analytics%20course%20online&placement=&utm_content=c&gclid=CjwKCAjw3pWDBhB3EiwAV1c5rExvk_vtXryW0oorSUoYFd-ziq_dYdZ4PlaUgdOQyap_xi7DIDtoAhoCeqMQAvD_BwE

To become a freelance data scientist

  1. https://www.upwork.com/hire/data-scientists/
Blog Technical Support Developing Resources