I have been coding AI for most of the decade. I have used both, Python and Java. While I am part of Python’s success, I have also come across its weaknesses. Here is a point by point comparison:
- Ease of Learning — Advantage Python: Python is easier to learn. The syntax is intuitive. A lot of complexities are handled by Python internally. Java is not that difficult, either, but some of the concepts of streaming, garbage collection, singletons, etc. can stump people. One such concept is the static typing — you have to declare a variable and its type before using it. In Python, you don’t have to worry about this at all. This becomes especially advantageous when you are dealing tensors, and undefined/ unsized arrays.
- Scripting Potential — Advantage Python: Python is very easy for writing scripts. When you have to do x, it’s very easy to write something quick and dirty to do it. For Java you still have to think through the whole application end to end. This is a huge advantage for data science programming, which needs to be agile by definition. It also lowers the barrier to entry for new data science enthusiasts, or business people who want to have a better understanding of what their data science teams are doing.
- Verbosity — Slight Advantage Python: Since Python is basically somewhere between an object oriented and a scripting language, it is very terse. Single statements do a lot. However, this is where I break with the community — I think for a good programmer Java can be code efficient, too, esp. after the Streaming APIs in Java 8 and later.
- Ecosystem — Tie between Python and Java: Let’s face it, today we all learn from each other, on Stack Overflow, Git and Google. Ecosystem matters. Open source community matters. Java has been popular for a long time, however, in my experience, for all practical purposes Python and Java have equally robust communities. Most open source tools are now released in both languages.
- Portability — Slight Advantage Java: If you recall Java was designed to do one thing — run on all possible hardware, as long as the Java Virtual Machine was defined for them. That still is one of its core strengths. Now, this only a slight advantage because in most data science situations software teams have control over the hardware configuration. However, when you think about the emerging Function as a Service and other such constructs, hardware still has the potential to surprise Python users.
- Performance — Advantage Java: I have found Java to be a better performer in terms of latency and memory footprint (despite garbage collection hiccups). One of the reasons Python is so simple, is that Python statements hide a lot of complexity. It also means that for every situation, the user has to be content with general purpose algorithms. That comes with a huge cost, esp. for data science situations, where new techniques are being invented everyday. Further, Java is compiled upfront, while Python code is parsed at run-time. May compiler level optimizations are not available to Python.
- Robustness — Advantage Java: Finally, Java is far more robust when writing enterprise grade applications. Error handling, resilience, code conformity — there are just so many small things that have been added to it over time. Python does not come even close. The dynamic typing is especially a hurdle when debugging, for example.
Smart data science teams can leverage the best of both the worlds. For experimentation, for agile development, or for training models that will not directly go to production e.g. for transfer learning, Python can be an amazing tool. However, for scalable, reliable production environments where performance parameters matter, Java still has an edge.
This schematic becomes more relevant if AI practitioners start to differentiate between Machine Rules and Business Rules, as discussed in this post.