AWS Data and Analytics: What's New in 2025
In this blog post, we'll explore some exciting new features and updates to the AWS data and analytics stack for 2025.
While there are multiple languages that data scientists can use, Python has essential advantages for data science. Each language has its specific history, purpose, strengths, and style. The choice of language should be informed by the type of work being done now and in the future. In this post, I make the (somewhat opinionated) case for why Python is the best choice as the primary language for data science coding shops.
In a word, Python gives data scientists leverage. I detail the features of the language that convey this benefit later in this post. However, this XKCD comic makes the point well and is only a slight exaggeration.
Python is used for data science because it gives unmatched leverage to collect data, process data, and deliver insights from data. Furthermore, code that is written in the "Pythonic" style: clear, readable, and easily shared. This encourages iteration and exploration that leads to insights.
Python is a One-Stop Shop for Data Science Tasks
There are numerous languages that can be used to perform data science and advanced analytics, so what makes Python so well suited for the data scientist? To answer this, think about what a data scientist is. Data science is the general practice of taking data and using advanced analytics to derive insight. The generality comes from the varied data sources and knowledge domains in which we operate. As the saying goes: a data scientist is part developer, part statistician, and part data engineer.
The generality and technical requirements require that the data scientist is prepared to perform all the steps of the analytics process. They should move from data to insight and be able to work in unfamiliar domains. The data scientist needs a tool that can do all pipeline steps well. Other languages like SPSS, SAS, and R are highly refined tools for statistics, but they suffer from deficiencies at other stages of analysis. Python does everything well. It is not refined for a single task; but it can do all tasks. The only constant in data science is that we are tasked with every step of analysis in all domains. Python makes this possible.
Python Gets Out of the Way so You Can Get to Analysis
Python is lean and powerful; it gets right to the analysis stage. There are other languages like C and Java that have more built-in guardrails. You need to specify memory allocation in C. In Java and Scala, the programmer needs to control object type carefully. In Python, many of these details are handled in the background. Python tends to abstract the user away from the gory details of the underlying computational backend. This level of abstraction does come with risks that memory allocation will be less than optimal or an object will be of the wrong type, but these errors can be fixed when it is essential. For the data scientist, who is mainly focused on analyzing data, this is a good trade-off.
Python lets the data scientist write lean and readable code that directly connects data to insight.
Python is a Developer Language
One of the most apparent advantages of Python is that it is a developer language. This is not to say it is the best developer language, but any functionality that a developer normally expects is available and robust. Python can handle most data types and database connections. Python even comes with SQLite, which can be used without a separate database. SQLite can be deployed as part of your program. Most API services have a python API client. If there is no client, you can use requests to use the API endpoints directly.
In Python, dependency issues are well-handled by dependency managers like conda and pip. In addition, virtual environments and Conda environments enable the user to maintain multiple versions of Python, with different suites of dependencies. Trying to set up multiple versions of R is much clumsier and error-prone.
Python has well-supported packages to deliver analytical endpoints and results to stakeholders and customers. Since data science is about communicating insights derived from data, these packages are major levers. Interactive graphics and dashboards can be made with Holoviews (panel) and plotly (dash). Webapps can be built on the Flask and Django frameworks.
Other aspects make Python usable for developer workflows. Packages like PySpark and Dask allow scalable analytics and cluster management. However, the main point is that Python has momentum in developer communities; if a developer-facing process or package does not have a Python implementation, it is such a disadvantage that one will be produced in short order.
Python Offers Excellent Data Visualization Tools
Python facilitates storytelling with an increasingly diverse set of data visualization tools. We can use matplotlib for fine-grained control of plots. Matplotlib. rcParams (eeRC stands for runtime compiled) is a dictionary that allows us to re-write the standard plot aesthetics. This is particularly useful to specify report or company-themed aesthetics. Try this code to see the variety of plotting preferences that you can customize.
import matplotlib
matplotlib.rcParams
A collection of more stylized and powerful data visualization packages is built on top of the matplotlib scaffolding. For easy and aesthetically pleasing plots we use Seaborn and ggplot2. These packages abstract away from the detailed control afforded by matplotlib and give great results with a few lines of code. For interactive web-based visualizations, we have Bokeh and Plotly. Plotly goes beyond web-based visuals with Dash which allows the user to make public or private dashboards. With a license. Dashboards can be seamlessly deployed on the Plotly servers, further reducing the overhead associated with web app deployment and security.
There are many other data visualization packages, but the Holoviews family of tools is worth discussing. Holoviews is part of a family of tools that was built by Anaconda. It includes datashader, which plots huge datasets as shades instead of points. This application specifically addresses big data and can be used to make some stunning visuals. In addition, Holoviews introduced Panel, a flavor of dashboarding app. The data visualization capacity of Python is formidable and dynamically developing.
Python Demo
Because it is easy to talk (or write) a case for anything, the careful reader might demand more. What is the proof that Python gives you leverage to do important data science with minimal fuss?
Let me show you...
For this article, I made the point that Python just works. Dream up a data science application, and with a few lines of clear and readable code, we can create an analysis that moves your team forward.
Let's demonstrate how easy we can go from a simple question to a data-backed analysis. The first question that comes to mind is: How popular is Python relative to other data science tools? Is this popularity changing year over year?
To measure this, we can look at a coding forum and ask about how many questions are asked for the language. What percentage of these questions are answered? This important piece of data science intelligence can be easily and quickly answered with a few lines of Python. We might use this to validate already published insights, but we can also change the code interactively so that we can look at the question from new angles as we think of them.
The use of Jupyter notebooks allows us to annotate the code in markdown; these notebooks can read much more like a paper than raw code. R has RMD files that are similar, but these must be knit. Knitting means that all of the code needs to run or the knitting fails. This leads to time lost troubleshooting and less rapid iteration.
The use of Conda environments makes the installation of totally new packages easy and quick. Conda environments make spinning up new projects very dynamic; they avoid costly dependency incompatibilities and can be simply removed when we are done with them.
Let's start with the stack overflow API:
https://stackapi.readthedocs.io/en/latest/
You will need to have Anaconda and PIP installed for this. Any code with a '$' proceeding it is done in the terminal the remainder of the code is executed in jupyter notebook cells
In summary, we had a question that could have very important implications for a business: Should we transition our data science to Python? Then we answered that question with a few lines of Python code. I want to pause to emphasize that this goes to the heart of the power of data science. We moved from a speculative discussion point to actionable primary data analysis in a single turn of very lean code. Now that we have identified a data source and rendered it into a visualization we can add to it easily, generating more nuanced analysis. Questions lead to answers and better questions. Python gets that out of the way.
Set Up References
Written by Gunnar Kleemann
Dr. Gunnar Kleemann runs a small friendly data science shop, Austin Capital Data. Gunnar has over 25 years of experience teaching a broad array of STEM fields; acting as a teacher and advisor to students in a number of contexts at institutions including at The Princeton University Genomics Institute, Barnard College, Albert Einstein College of Medicine, the University of Nebraska-Lincoln, K2, Data Society, the Princeton Review, and of course Accelebrate. Most recently he has been a Lecturer at UC Berkeley’s Master’s in Data Science (MIDS) program since 2016.
Gunnar is primarily interested in making the benefits of data science more broadly accessible since he believes that data science skills will be the core delimiters in the future world. To this end, he regularly presents his results at international conferences, most recently at All Things Open 2021. Gunnar has published research on physiological and behavioral genomics in the most prominent international journals, including Cell, Genetics, and the Journal of Neuroscience.
Our Data Science with Python training helps you master Python programming for data analysis and machine learning.
Browse our Data Science for Python CoursesIn this blog post, we'll explore some exciting new features and updates to the AWS data and analytics stack for 2025.
When looking at what it takes to find a job in your career of choice or to move to the next step in your discipline, we often think predominantly of technical skills — also known as hard skills. As in, what specific competencies do you need to complete the core duties of a role?
Multi-cloud environments offer companies many technology and business benefits. And, by using the Google Cloud Platform (GCP), companies can easily manage multi-cloud topologies.
Data literacy is the ability to explore, understand, and communicate data in a meaningful way. This article will guide you through just a few of the aspects of data literacy.
The Python defaultdict automatically assigns a default value to missing keys, simplifying dictionary operations and preventing KeyError exceptions, especially when working with collections.
Many organizations are considering migrating from proprietary software like SAS to Python. Before making the move, understand the key differences, advantages, disadvantages, and challenges.
While there are multiple languages that data scientists can use, Python has essential advantages for data science. Find out why in this article.
Learn everything about IBM Cognos Analytics through best training and certification courses with ExitCertified. Enroll today!
Learn how to prepare for what’s next in the IT training industry with insight information regarding training expenditures, the methods firms use to teach IT workers, and the effects of training on productivity and job satisfaction.
There’s never been a better time to build your credentials with IT certifications. These 20 highest-paying certifications can boost your salary and lead you to a better career in 2023.
Focusing on being proficient with certain programming languages and frameworks will help put you on the path to long-term sustainable success in career development.
Cloud Computing has become the “Gold” standard for enterprises to access IT infrastructure, hardware, and software resources. It offers a big shift to the way businesses think about IT resources.
Ascendient Learning is the coming together of three highly respected brands; Accelebrate, ExitCertified, and Web Age Solutions - renowned for their training expertise - to form one company committed to providing excellence in outcomes-based technical training.
With our winning team, we provide a full suite of customizable training to help organizations and teams upskill, reskill, and meet the growing demand for technical development because we believe that when talent meets drive, individuals rise, and businesses thrive.