Does your business depend on Python to operate on a daily basis? If you say no, you should check again. Python is everywhere. It wasn’t long ago that Python was used only for ad hoc purposes like writing test tools for web services or occasional data preparation tasks, where performance wasn’t a concern. Of course, it’s been used for several years for web development, thanks to popular frameworks like Django and Flask, and more recently in the disciplines of scientific computing and data science. While all of these examples paint a picture of Python’s versatility, they don’t quite convey how prevalent the Python language is or how powerful the community and ecosystem that has been built over the past 30 years.
For more articles on the future of data management in 2022, download the Big Data Quarterly: Data Sourcebook (Winter 2021) Edition
High level development language
It is important to understand how and why Python got its place within all these different types of software solutions. Python is a high-level development language, which means that it removes a lot of standard – or redundant – code, making it a more succinct language. This in turn lends itself to ease of learning.
When comparing Python to a language like Java and its ecosystem, it is clear that Python has a very different path to adoption. Java promised a “write once anywhere” solution, which was very attractive. It quickly built a support base and a thriving suite of enterprise apps to drive adoption. Despite being a higher level language, Java still requires quite a bit of standard code and learning is still somewhat complicated compared to Python. This hindered Java from certain audiences. More importantly in this discussion, those in the scientific community just wanted language to get out of the way so they could focus on science.
Since most languages have flaws or shortcomings, so does Python. However, these shortcomings have resulted in a very strong community, collectively interested in working together to improve the limitations of language. This community is one of the strongest out there and is one of the biggest reasons for Python’s success.
speed and performance
Python has always been beat in performance – that is, how fast code can execute. The importance of speed in a use case will depend on who you are or the purpose of the code. Although Python has historically not been the fastest language, its ease of use provided a trade-off that most were willing to make. This makes sense in many ways: If someone can spend a few minutes writing 10 lines of code in Python versus several more minutes writing 50 lines in Java and they will only run the code a few times, why not save time writing the code and lose a little while executing instructions software? This approach is very attractive to the scientific computing and data science communities. However, as workloads have increased, so has the amount of time it takes to execute this code. This left an opportunity for other languages such as Scala with Spark; However, this language failed to catch on due to its high level of complexity.
Given these longer runtimes, the community has come up with a way to level up the Python story. Python can pack serious performance when the Python libraries are implemented in low-level languages such as C++, and after that, they are offered to Python users for ease of consumption. The appeal of this approach is the combination of the ease of use of Python along with the performance of native C++, which results in significantly reduced execution times.
Mathematics and science
Given that Python has a dynamic 30-year history, for the sake of brevity, let’s start to the chase. Python completely dominates other languages when it comes to scientific computing and data science. We are not getting into a Fortran-type argument here. Instead, we look at the ubiquity of Python as it relates to math and science problems, which are of course core disciplines of all of the industry’s most exciting topics like machine and deep learning. No wonder Python is so popular.
There are many people and many reasons that contribute to this prevalence. I think one stands above the rest: Numfocus – specifically PyData and its data-centric community in the Python ecosystem.
There are quite a few libraries in this ecosystem that are fundamental to many use cases across industries, including – most notably NumPy, pandas, scikit-learn, Matplotlib, Dask and Jupyter.
Dask provides a framework to help scale Python workloads. It’s not the easiest thing to use for those without a deeper level of engineering expertise, but it helps expand beyond a single machine.