Data Exploration – Summarising, Presenting and Compressing Data focuses on practical techniques for understanding, managing, and analysing data effectively. The lecture begins by introducing common data file formats used in Data Science, including CSV, TSV, Excel, JSON, text files, databases, HTML, and Pickle files. It also discusses working with compressed (ZIP) files to save storage space and improve data transfer efficiency. Students learn how to read and import these formats using Python and Pandas, as well as different methods of uploading data into Google Colab.
The session then explores types of analytics: descriptive, diagnostic, predictive, and prescriptive analytics, highlighting their real-world applications. Core statistical concepts are covered, including measures of central tendency (mean, median, mode), variability (range, variance, standard deviation, IQR), distribution shape (skewness, kurtosis), and relationships between variables (covariance, correlation, causality). Hypothesis testing and statistical significance are also introduced.
Finally, the lecture covers clustering techniques such as K-means and DBSCAN, including their algorithms, evaluation metrics like silhouette score, advantages, limitations, and practical demonstrations on synthetic datasets.