Being a Data Scientist focuses on the practical skills and tools required to work effectively in Data Science. The session begins by introducing code editors and Integrated Development Environments (IDEs), explaining their importance in writing, testing, and debugging Python programs. Learners explore commonly used tools such as Jupyter Notebook, Google Colab, PyCharm, and Anaconda, understanding how these platforms support data analysis and machine learning workflows. Emphasis is placed on why Google Colab is widely used, including its free GPU access, cloud-based environment, and seamless GitHub integration.
The lecture then provides a hands-on introduction to essential Python libraries. NumPy is covered as the foundation for numerical computing and array operations. Pandas is introduced for data manipulation, cleaning, reshaping, handling missing values, and summarizing datasets. Matplotlib is presented as a visualization tool for creating and customizing plots.
Finally, the lecture explains Version Control Systems (VCS), including local, centralized, and distributed systems, and clarifies the difference between Git and GitHub. Students learn how version control supports collaboration, code management, and reproducibility in Data Science projects.