Sitesetter: October 2022

Project Leyden, an ambitious effort to improve startup time, performance, and footprint of Java programs, is set to offer condensers. A condenser is code that runs between compile time and run time and transforms the original program into a new, faster, and potentially smaller program.

In an online paper published October 13, Mark Reinhold, chief architect of the Java platform group at Oracle, said a program’s startup and warmup times and footprint could be improved by temporarily shifting some of its computation to a point either later in run time or backward to a point earlier than run time. Performance could further be boosted by constraining some computation related to Java’s dynamic features, such as class loading, class redefinition, and reflection, thus enabling better code analysis and even more optimization.

Project Leyden will implement these shifting, constraining, and optimizing transformations as condensers, Reinhold said. Also, new language features will be investigated to allow developers to shift computation themselves, enabling further condensation. However, the Java Platform Specification will need to evolve to support these transformations. The JDK’s tools and formats for code artifacts such as JAR files will also need to be extended to support condensers.

The condensation model offers developers greater flexibility, Reinhold said. Developers can choose which condensers to apply and in so doing choose whether and how to accept constraints that limit Java’s natural dynamism. The condensation model also gives Java implementations considerable freedom. As long as a condenser preserves program meaning and does not impose constraints except those accepted by the developer, an implementation will have wide latitude for optimizing the result.

To improve startup and warmup time and footprint can be best done by identifying computation that can simply be eliminated, Reinhold said. Failing that, computation can be shifted backward or forward in time. This concept of shifting computation in time is not new. Java implementations already have many features to shift computation. For example, compile-time constant folding shifts computation backward in time from run time to compile time, and garbage collection shifts the reclamation of memory forward in time. Other computation-shifting mechanisms are optional including ahead-of-time computation and class-data sharing.

Project Leyden was under discussion for more than two years before beginning to move forward earlier this year. The project is sponsored by the HotSpot virtual machine and core libraries groups within the Java development domain.

https://www.infoworld.com/

The C programming language has been alive and kicking since 1972, and it still reigns as one of the fundamental building blocks of our software-studded world. But what about the dozens of of newer languages that have emerged over the last few decades? Some were explicitly designed to challenge C’s dominance, while others chip away at it as a byproduct of their own popularity.

It's hard to beat C for performance, bare-metal compatibility, and ubiquity. Still, it’s worth seeing how it stacks up against some of the big-name language competition.

C vs. C++

C is frequently compared to C++, the language that—as the name indicates—was created as an extension of C. The differences between C++ and C could be characterized as extensive, or excessive, depending on whom you ask.

While still being C-like in its syntax and approach, C++ provides many genuinely useful features that aren’t available natively in C: namespaces, templates, exceptions, automatic memory management, and so on. Projects that demand top-tier performance—like databases and machine learning systems—are frequently written in C++, using those features to wring every drop of performance out of the system.

Further, C++ continues to expand far more aggressively than C. The forthcoming C++ 23 brings even more to the table including modules, coroutines, and a modularized standard library for faster compilation and more sucking code. By contrast, the next planned version to the C standard, C2x, adds little and focuses on retaining backward compatibility.

The thing is, all of the pluses in C++ can also work as minuses. Big ones. The more C++ features you use, the more complexity you introduce and the more difficult it becomes to tame the results. Developers who confine themselves to a subset of C++ can avoid many of its worst pitfalls. But some shops want to guard against that complexity altogether. The Linux kernel development team, for instance, eschews C++, and while it's now eyeing Rust as a language for future kernel additions, the majority of Linux will still be written in C.

Picking C over C++ is a way for developers and those who maintain their code to embrace enforced minimalism and avoid tangling with the excesses of C++. Of course, C++ has a rich set of high-level features for good reason. But if minimalism is a better fit for current and future projects—and project teams—then C makes more sense.

C vs. Java

After decades, Java remains a staple of enterprise software development—and a staple of development generally. Java syntax borrows a great deal from C and C++. Unlike C, though, Java doesn’t by default compile to native code. Instead, Java's JIT (just-in-time) compiler compiles Java code to run in the target environment. The JIT engine optimizes routines at runtime based on program behavior, allowing for many classes of optimization that aren’t possible with ahead-of-time compiled C. Under the right circumstances, JIT-compiled Java code can approach or even exceed the performance of C.

And, while the Java runtime automates memory management, it's possible to work around that. For example, Apache Spark optimizes in-memory processing in part by using "unsafe" parts of the Java runtime to directly allocate and manage memory and avoid the overhead of the JVM's garbage collection system.

Java's “write once, run anywhere” philosophy also makes it possible for Java programs to run with relatively little tweaking for a target architecture. By contrast, although C has been ported to a great many architectures, any given C program may still require customization to run properly on, say, Windows versus Linux.

This combination of portability and strong performance, along with a massive ecosystem of software libraries and frameworks, makes Java a go-to language and runtime for building enterprise applications. Where it falls short of C is an area where the language was never meant to compete: running close to the metal, or working directly with hardware.

C code is compiled into machine code, which is executed by the process directly. Java is compiled into bytecode, which is intermediate code that the JVM interpreter then converts to machine code. Further, although Java’s automatic memory management is a blessing in most circumstances, C is better suited for programs that must make optimal use of limited memory resources, because of its small initial footprint.

C vs. C# and .NET

Nearly two decades after their introduction, C# and .NET remain major parts of the enterprise software world. It has been said that C# and .NET were Microsoft’s response to Java—a managed code compiler system and universal runtime—and so many comparisons between C and Java also hold up for C and C#/.NET.

Like Java (and to some extent Python), .NET offers portability across a variety of platforms and a vast ecosystem of integrated software. These are no small advantages given how much enterprise-oriented development takes place in the .NET world. When you develop a program in C#, or any other .NET language, you are able to draw on a universe of tools and libraries written for the .NET runtime.

Another Java-like .NET advantage is JIT optimization. C# and .NET programs can be compiled ahead of time as per C, but they’re mainly just-in-time compiled by the .NET runtime and optimized with runtime information. JIT compilation allows all sorts of in-place optimizations for a running .NET program that can’t be done in C.

Like C (and Java, to a degree), C# and .NET provide various mechanisms for accessing memory directly. Heap, stack, and unmanaged system memory are all accessible via .NET APIs and objects. And developers can use the unsafe mode in .NET to achieve even greater performance.

None of this comes for free, though. Managed objects and unsafe objects cannot be arbitrarily exchanged, and marshaling between them incurs a performance cost. Therefore, maximizing the performance of .NET applications means keeping movement between managed and unmanaged objects to a minimum.

When yo can’t afford to pay the penalty for managed versus unmanaged memory, or when the .NET runtime is a poor choice for the target environment (e.g., kernel space) or may not be available at all, then C is what you need. And unlike C# and .NET, C unlocks direct memory access by default.

C vs. Go

Go syntax owes much to C—curly braces as delimiters and statements terminated with semicolons are just two examples. Developers proficient in C can typically leap right into Go without much difficulty, even taking into account new Go features like namespaces and package management.

Readable code was one of Go’s guiding design goals: Make it easy for developers to get up to speed with any Go project and become proficient with the codebase in short order. C codebases can be hard to grok, as they are prone to turning into a rat’s nest of macros and #ifdefs specific to both a project and a given team. Go’s syntax, and its built-in code formatting and project management tools, are meant to keep those kinds of institutional problems at bay.

Go also features extras like goroutines and channels, language-level tools for handling concurrency and message passing between components. C would require such things to be hand-rolled or supplied by an external library, but Go provides them out-of-the-box, making it far easier to construct software that needs them.

Where Go differs most from C under the hood is in memory management. Go objects are automatically managed and garbage-collected by default. For most programming jobs, this is tremendously convenient. But it also means that any program that requires deterministic handling of memory will be harder to write.

Go does include the unsafe package for circumventing some of Go’s type handling safeties, such as reading and writing arbitrary memory with a Pointer type. But unsafe comes with a warning that programs written with it “may be non-portable and are not protected by the Go 1 compatibility guidelines.”

Go is well-suited for building programs like command-line utilities and network services, because they rarely need such fine-grained manipulations. But low-level device drivers, kernel-space operating system components, and other tasks that demand exacting control over memory layout and management are best created in C.

C vs. Rust

In some ways, Rust is a response to the memory management conundrums created by C and C++, and to many other shortcomings of these languages, as well. Rust compiles to native machine code, so it’s considered on a par with C as far as performance. Memory safety by default, though, is Rust’s main selling point.

Rust’s syntax and compilation rules help developers avoid common memory management blunders. If a program has a memory management issue that crosses Rust syntax, it simply won’t compile. Newcomers to the language—especially coming from a language like C, that provides plenty of room for such bugs—spend the first phase of their Rust education learning how to appease the compiler. But Rust proponents argue that this near-term pain has a long-term payoff: safer code that doesn’t sacrifice speed.

Rust's tooling also improves on C. Project and component management are part of the toolchain supplied with Rust by default, which is the same as with Go. There is a default, recommended way to manage packages, organize project folders, and handle a great many other things that in C are ad-hoc at best, with each project and team handling them differently.

Still, what is touted as an advantage in Rust may not seem like one to a C developer. Rust’s compile-time safety features can’t be disabled, so even the most trivial Rust program must conform to Rust’s memory safety strictures. C may be less safe by default, but it is much more flexible and forgiving when necessary.

Another possible drawback is the size of the Rust language. C has relatively few features, even when taking into account the standard library. The Rust feature set is sprawling and continues to grow. As with C++, the larger feature set means more power, but also more complexity. C is a smaller language, but that much easier to model mentally, so perhaps better suited to projects where Rust would be too much.

C vs. Python

These days, whenever the talk is about software development, Python always seems to enter the conversation. After all, Python is “the second best language for everything,” and unquestionably one of the most versatile, with thousands of third-party libraries available.

What Python emphasizes, and where it differs most from C, is favoring speed of development over speed of execution. A program that might take an hour to put together in another language—like C—might be assembled in Python in minutes. On the flip-side, that program might take seconds to execute in C, but a minute to run in Python. (As a good rule of thumb, Python programs generally run an order of magnitude slower than their C counterparts.) But for many jobs on modern hardware, Python is fast enough, and that has been key to its uptake.

Another major difference is memory management. Python programs are fully memory-managed by the Python runtime, so developers don’t have to worry about the nitty-gritty of allocating and freeing memory. But here again, developer ease comes at the cost of runtime performance. Writing C programs requires scrupulous attention to memory management, but the resulting programs are often the gold standard for pure machine speed.

Under the skin, though, Python and C share a deep connection: the reference Python runtime is written in C. This allows Python programs to wrap libraries written in C and C++. Significant chunks of the Python ecosystem of third-party libraries, such as for machine learning, have C code at their core. In many cases, it isn't a question of C versus Python, but more a question of which parts of your application should be written in C and which in Python.

If speed of development matters more than speed of execution, and if most of the performant parts of the program can be isolated into standalone components (as opposed to being spread throughout the code), either pure Python or a mix of Python and C libraries make a better choice than C alone. Otherwise, C still rules.

C vs. Carbon

Another recent possible contender for both C and C++ is Carbon, a new language that is currently under heavy development.

Carbon's goal is to be a modern alternative to C and C++, with a straightforward syntax, modern tooling and code-organization techniques, and solutions to problems C and C++ programmers have long faced. It's also meant to provide interoperation with C++ codebases, so existing code can be migrated incrementally. All this is a welcome effort, since C and C++ have historically had primitive tooling and processes compared to more recently developed languages.

So what's the downside? Right now Carbon is an experimental project, not remotely ready for production use. There isn't even a working compiler; just an online code explorer. It's going to be a while before Carbon becomes a practical alternative to C or C++, if it ever does.

Courtesy: https://www.infoworld.com/

Sitesetter

Wednesday, 26 October 2022

Java 20 begins to take shape

Condensers promise to accelerate Java programs

Saturday, 1 October 2022

Why the C programming language still rules