Business Intelligence - Assignment 2

In our Business Intelligence class last August 22 and 24, we covered sections 2 to 5 of Udemy's Business Analytics course. These sections included Contingency Tables, Measures of Distribution, Measures of Variation, Distribution Visualization, Bivariate Data, Correlation, and Regression Analysis.

The first topic we discussed was contingency tables, useful tools in various industries, such as surveys, corporate intelligence, engineering, and scientific research. These tables assist us in comprehending the relationships between variables. By aggregating data, counting frequencies, and calculating proportions, we learned how to make contingency tables. We also looked into the Chi-Square distribution, which helps determine the statistical significance of differences in contingency tables.

We then looked at statistical distribution indicators such as mean, median, outliers, mode, minimum, maximum, and quintiles. These metrics provide information on the distribution of data. For example, the mean is the average value obtained by adding all values and dividing by the size of the dataset. We also talked about normal and continuous probability distributions, emphasizing the importance of the normal distribution with its bell-shaped curve parameterized by mean and standard deviation. Density functions, cumulative distribution functions (CDFs), and the 68-95-99.7 formula for determining data percentages within standard deviation ranges were all introduced to us.

Kurtosis, both positive and negative, was also investigated. Compared to a normal distribution, positive kurtosis shows fewer values in the tails, and negative kurtosis indicates more importance in the tails. We talked about skewed distributions, both positively and negatively skewed, with longer right tails and the need of utilizing the median as a central tendency measure when dealing with outliers.

The principles of sampling were also discussed, covering demographics and sample methods such as random, stratified, and cluster sampling. We emphasized the need to select a representative sample to draw appropriate conclusions about a larger population.

Moving on to Bivariate Data, it teaches how to use scatter plots to analyze pairs of variables to uncover connections. We focused on positive and negative relationships, as well as the significance of linear patterns in the data. Three management strategies were developed to deal with missing or null values in datasets.

We examined how data analysis might reduce uncertainty in Unpacking Uncertainty and Entropy in Data Analysis. Entropy was introduced as a tool for studying uncertainty in events and outcomes. It is a measure of unpredictability and information content. We observed how entropy ideas assist analysts in prioritizing essential outcomes for reporting, successfully lowering uncertainty.

Finally, we looked into Logistic Regression for Binary Outcomes. When dealing with categorical outcomes, logistic regression is widely utilized. It represents probabilities ranging from 0 to 1 and introduces the logistic function, which provides a superior fit for this type of data. We also looked at the mathematics underpinning odds ratios and logistic regression forecasts.

In summary, our class talks covered fundamental subjects in business analytics such as contingency tables, distribution measurements, bivariate data analysis, and logistic regression. These ideas are fundamental to comprehending and evaluating data in a variety of professions.

Posted using Honouree