Why engineers should treasure Python over Excels of the world
I love Python and I love open-source communities. My love did not start at the first sight. At first, I may have been drawn into the ecosystem due to the hype and curiosity. Nowadays though I can reap the benefits of this beautiful ecosystem and it is easy for me to understand and talk about this love relationship.
Over the last few years, I have tried to encourage my engineering colleagues to add Python scripting to their skillset. (if you want to start your journey I wrote a how-to here). Many of these colleagues and friends have sharpened their skills in Microsoft Excel or other propriety tools (OFM for example in Oil & Gas) for years. The thought of learning scripting in Python has been both exciting and scary for them. Generally, their arguments or their reluctance falls under one of the points below:
- Sunken cost argument (different excuses but all cantered around why should I learn a new tool, when I have spent so much time learning the current ones)
- My challenge/model is too complex for anything that Python can handle
- General fear of scripting which goes hand to hand with the next argument
- Anything you can do in python I can do in Excel (replace Excel with the software of your choice with Graphical User Interface (GUI))
In this article I am hoping to show you why basic knowledge of scripting in Python and a working knowledge of some of its libraries such as Pandas should be a must to have in any engineers’ toolbox.
I have worked in my industry in different roles from an external consultant to the engineer responsible for delivering projects in operating companies and I have seen my share of useful but unreliable solutions developed on Excel and similar tools. The following story should sound familiar to you:
- Senior Engineer/Manager (M): We have plenty of data to work with, we need x number of more engineers to be able to create and run all these models
- Smart junior/grad engineer(E): Hey, it should be alright, I know advanced excel and VBA from my university days, I should be able to automate the model generation piece and we can do this within our team
- M: great, show me how your tool works, let’s present it to the higher-up management, we are great
So far it is all great, and it could have been great if the story would have finished here. As many of you may know, it does not. Time passes, and M thinks why not try to capture more complexities into the Excel Macro. As E starts to add more complexities, the Excel file starts to get larger and larger and then crashes start to occur. The file that used to open in a few seconds and run things fast, now takes minutes to open, half the time it is getting corrupted and the other half, it may crash in the middle of running the latest Macros. This part was common, but from this point on, I have seen different scenarios playing out in different companies:
- M and E decide to simplify their assumptions (sometimes to the point that they may not be valid, but the justification is we are constrained by the tool)
- M leverages business continuity argument and complains to the IT that IT should supply a more stable version of Microsoft Excel/Environment
- E suggests instead of running Microsoft Excel on one machine, why not running multiple instances on many, accepting 50% completion of macros over night as a success
Time goes by, Microsoft/software vendor updates the software and adds some improvements, but things barely get better, going back is not an option and scaling is a challenge no one knew they are going to face.
One may think the logical conclusion from the scenario above is for the engineering team to conclude they are using Excel in a way that is not suited and it is not the best tool for their problem. But unfortunately, my experience has been quite different, in at least two occasions the conclusion was our problem is so complex that no tool can really handle it!
The issue is sometimes we tend to focus on our field of expertise the most and forget that many other industries have very similar issues like us and they may have developed solutions that we can borrow from. There are on average 1 billion tweets sent every 2 days, if twitter can handle that and do all sort of analytics on it, there should be tools capable of holding 400 simulation models for 1000 wells for 40 years. The great news is, the solution exists, it is free and it has one of the best communities in the world supporting it. It is called Python and all it requires from you is to spend some time to learn its language.
Why YOU should treasure Python over Excels of the world
If you have not been convinced yet (which if I know my engineer friends and colleagues, the chances are highly likely that you have not), below I am making a case on why using Python to play with your data is a much more superior tool compared to Excel or any other Proprietary tool.
It is also worth mentioning that I am a daily user of Microsoft Excel and use many other tools which are not open source or have Graphical User Interfaces (GUI). My arguments here are to persuade you that the initial cost of learning Python can pay many years of dividends and a huge bargain. My mains arguments are:
- We should use more Open Source and less Proprietary software
- Scripts are powerful. They can eat GUI for breakfast any day
- There are future opportunities unforeseen today if you are not confined by your tools
Let me explain them in a bit more detail:
1) Use more Open-Source and less Proprietary software
IBM defines open-source software as:
Open-source software (OSS) is a decentralized development model that distributes source code publicly for open collaboration and peer production known as “the open-source way.
What that means in day-to-day practice is for the Open-Source Software the source code is open to be seen and changed by anyone. So, if your action causes the software to crash, you can try and find the bug, and fix it for yourself and also share it with the world. The next version of the software has the fix you made.
It may not sound hugely important if you are a beginner in python and you are only going to use Python for automating some mundane tasks, but think about some irritating issues in some of the popular software you use that haven’t been fixed for years, You can check this thread on challenges of working with large Excel datasets that Excel power users are facing or this issue a user has faced for years. The user request does not get priority because it is not common, but for the user who has the issue, it is an everyday pain. You are dealing with a big corporation like Microsoft with a global user based reach. They have to prioritise their resources on what issue to address, and if your painful issue is not common then tough luck.
2) Scripts can eat GUI for breakfast
Here I am going to give you two reasons on the superiority of scripts over GUI:
- Engineering work requires a lot of mental power and focus. You cannot get interrupted constantly and still produce quality work. Working with scripts rather than GUIs seems scary at the beginning. As you get more comfortable with writing scripts and navigate your way through things using text commands and scripts, you realise staying in a single environment minimises the distractions. This ability to focus more can lead to generating higher quality work. Using mouse and GUI while visually enticing would cause you to move from one screen to the next which means your eyes glance on various things on screen. They can distract you and keeping yourself focus drains your mental strength.
- The second importance of the script vs GUI is the reproducibility of your work. There is a learning curve. It may make you slower than the rest at the start.As you get comfortable though, you will spend less time on the preparation and more on analysing and engineering while staying focus for longer. Imagine a simple case of copy/pasting a file. Compare automating or repeating a simple file copy/paste. The movement of the mouse, right clicking and selecting copy, going to the destination to paste the data, vs the following (bash not Python):
cp SourcePath/file DestinationPath/file
3) Unlocking Future opportunities unforeseen today
When Steve jobs introduced iPhone in 2008, Steve Balmer, the Microsoft CEO at the time, not only laughed at the question on iPhone but also described iPhone as “the most expensive phone in the world”, one that “doesn’t appeal to business customers.” What many could not see at the time was how iPhone would profoundly change how we interact with the technology and the world. Who could have envisioned we would buy or discuss ‘Bored ape” pictures (current floor price is $150K USD) in less than 15 years after iPhone launch using our phones. The same goes with Python and our day-to-day engineering challenges. In the current Excel driven mindset, the possibilities are confined with the tools at hand, but with a different toolset and when the possibilities are almost endless, who knows what applications each engineer can think of. I am excited to see those, aren’t you?
In this article I showed general scenarios on how engineers can end up with closed source solutions and run into scalability issue as the challenges are getting more complex and the datasets are getting larger. I offered my solution as adding Python to our toolbox.
If a task worth doing, it is worth doing it right, and if you think you need to do it more than once, try to think of automating it. One of the best tools to help you in that path is Python and its wonderful community. Python is free, open source, easy to use in automation projects, extremely flexible, has been tried across many industries and disciplines and is constantly evolving.
If you want to see where to start on your Python journey, I wrote about it here: