Sparrow programming language»Blog
Lucian Radu Teodorescu
I'm still working on the type system. But I find multiple ways of procrastinating. One of them is my work on 'bitcopiable'. And, as this is a good feature, I'm going to drop a few words here about it

The problem

Objects can be self-referentiable. That is, an object, can have a pointer, directly or indirectly to itself. One of the best examples is a node in a circular linked list. The content of the node will contain a pointer that will point to the node itself.

Another example, relatively common, is the following:
datatype Parent
    children: Vector(Child)

datatype Child
    parent: Parent Ptr

Again, a typical Parent object will point to itself. That is what we call a self-referentiable object. An object for which there is a one-to-one mapping between its value and the value of the this pointer.

In general, to be safe, the compiler may assume that all objects are self-referentiable. There are a few exceptions, most notable the primitive types.

As a consequence, for the vast majority of types, the compiler uses pointers to pass objects around. Please see Beyond the type system for more info.

Enter bicopiable

As opposed to self-referentiable types, there are the bitcopiable types. These are objects for which their value are completely independent of the this pointer. We can change their address, but the object remains the same.

There are a few advantages of bitcopiable objects:
  • the compiler can place the object at different addresses, including CPU registers, without invoking any constructors or assign operators

  • we can memcpy or memmove such objects

  • we can maintain ABI compatibility for derived types

  • In general, if the compiler can detect that an object is bitcopiable, it can optimize the code.

    Bitcopiable datatypes in Sparrow

    Recently, Sparrow added a new modifier: bitcopiable. A datatype declared with the bitcopiable modifier is assumes that it can never be self-referentiable, and it allows the compiler to optimize transfer of those objects.

    For example, the compiler can omit the call of copy constructors when dealing with bitcopiable datatypes.
    Another example, a vector of bitcopiable datatypes can directly memcpy its content whenever it resizes. To make this happen, Sparrow has a isBitcopiable(type) function to detect if a type has is bitcopiable or not.

    For generic code, it's also useful that the bitcopiable property is deduced automatically. For this, Sparrow added a autoBitcopiable modifier. A datatype declared with this modifier will be bitcopiable if all the fields are also bitcopiable.

    This is used by standard tuples. A tuple is bitcopiable if all the dependent types are also bitcopiable.

    Bitcopiable support is a very small feature of Sparro, but it can lead to some nice optimizations.
    Lucian Radu Teodorescu
    Over the holidays there was a big debate on compilation time in the C++ community. It started with a blog post by Aras Pranckevičius: "Modern" C++ Lamentations. The post picks on the Standard Ranges by Eric Niebler, that shows how you would implement Pythagorean Triples with Ranges, a new addition to C++20 standard.

    At the heart of Aras' argument was the following two facts:
    • the range-based Pythagorean Triples is much longer and more complex than a simple C-style implementation
    • compilation time increases from 0.071s to 3.02s (when going from C-style to range-based implementation -- that is 42 times slower

    The "modern" C++ tends to be more complex and less usable.

    As Sparrow has for some time ranges as a core usability feature, and compilation time is not necessarily a strong point of Sparrow, I feel obliged to state my position on this topic; it's been a bit more than a month since the heat started, so this is a good opportunity to look at the problem calmly.

    Here is the summary:
    1. as always, I tend to find a balance between the two extremes
    2. the balance is more shifted towards Aras' side

    First, I think most of us agree that Eric's example is a bad one. That doesn't mean that there aren't a lot of good examples in which Ranges can help a lot. And yes, the programmer's mind is still the bottleneck, not the compilation time. That's why we are not programming in assembly anymore. Within reasonable limits, we should focus on improving the developer's productivity; that is, as long as the improvements we make are greater compared to the problems we introduce/aggravate.

    And, that is exactly what Sparrow aims to do: without sacrificing performance and, to some extent, compilation time, improve the "naturalness" of programming---make it easier for programmers to write quality code and enjoy programming.

    So far, in terms of compilation-time, Sparrow is not doing that great. Here are a few drawbacks:
    1. current compilation times are not great
    2. Sparrow relies on a lot of generics
    3. Sparrow relies on hyper-metaprogramming for some of its features

    On the first point, things are excusable: Sparrow did not yet reach maturity; very little effort spent in making the compiler faster.

    The current work on moving towards value semantics will make things faster to compile. Previously, most of the values were held by reference. As a direct consequence, each time the program needs to access a compile-time value, the compiler would have to reach the compile-time execution engine and dereference a compile-time pointer. Moving towards value semantics, we reduce the need to call the compile-time execution engine.

    Using C++-style generics is also a potential problem for compilation-time. On this topic, the long-term goal is to switch to "one compilation" of a generic. That is, the generic is compiled only once, regardless of how many times the generic is actually instantiated. The tricky part is to keep the same code specialization characteristic the C++ has, that makes generics fast. I have some ideas on how to make this possible; describing them here would be beyond the scope of this article.

    Finally, there is the problem of hyper-metaprogramming. The more the users will use it, the slower the compilation-time will get. But here, we have another trick that we can apply: external compile-time modules. Once a programmer creates a compile-time functionality, it should be able to instruct the compiler to compile everything to a shared library and make sure that the library is loaded into the compiler. This way, even if the user writes metaprograms in Sparrow, they can be optimized out just like any other compiler module. It's basically extending the compiler from within Sparrow. This is again not yet implemented.

    So yes, Sparrow allows one to shoot oneself in the foot but provides means to avoid these types of issues (well, at least that's the intention).

    And since we are discussing compilation-time, there is also one thing on Sparrow's roadmap that is worth noticing. Sparrow aims to stay away from the "include" model of C and C++. Instead, it aims at the following behavior:
    • the content of a file is compiled only once
    • dependencies between files are controlled at a more detailed level
    • changing something in a file, only leads to the recompilation of the declarations that actually depend on the change
      • if one changes the body of a function, without affecting the interface, no recompilation is needed for other modules
      • if one changes the signature of a function, only the declarations that depend on that function would be recompiled
      • if one changes the size of a datatype, only the declarations dependent on that size would be recompiled
    • linking would, of course, have to be done for every change, just like in C++

    With that, I would say the Sparrow has a good plan of making the compile-time problem go away. If all these are implemented, then Sparrow would allow to raise the level of abstraction, make programming easier, without compromising compilation time. All that, of course, if we don't write over-complex code, just for the sake of showing off some features :p.
    Lucian Radu Teodorescu
    First, the slow part:
    • personal problems for the last two months -- life is hard
    • changing the type system is slightly harder than I originally thought
    • adding unit tests & refactoring along the way

    There is little you can do for the first bullet point. Therefore, I'll focus on the other two bullet points.

    Before starting the type system work, Sparrow had zero unit-tests. With this effort, I was convinced that we need unit-tests to make considerable progress in Sparrow. That took a lot of work, but it's starting to show signs that it pays off. I have now, some rough numbers that I can use to see if it pays off.

    Previously, when I developed a more complex features, I encountered one or more road-blocks. I used to spend, let's say, 20 hours to avoid a road-block. Because of the complexity of the software, I didn't fully understand what the compiler is supposed to do, so I had a lot of wrong assumptions, and also there were a lot of bugs in the code. To overcome such a road-block, I typically had some kind of special-condition check that would fix the case I was working on, without affecting the already working functionality. But this is the worst thing that one can do: it just accumulates complexity, making it even harder to solve a problem next time.

    With the new mindset, whenever I'm encountering a road-block, I refactor the code, and I unit test it. This costs me around 25 hours of work (very rough estimates). But after the refactoring & testing, I spend about 4 hours or less to avoid the road-block.

    Doing the math yields a great advantage for refactoring & unit testing. I'll gain time each time I encounter 2 road-blocks in the same part of the code. Yes, it feels slower to refactor & add unit tests for everything, but in the grand scheme of things, it actually makes things faster.

    I was able to see this slow/fast perspective with this feature. First, I spent a lot of time trying to avoid a road-block. Then, I go back and refactor the code. Then all the road-blocks can be easily overcome. I have unit-tests that will tell me each mistake I'm making.

    I hope that, it won't be long until I have unit-tests for most of the compiler logic. Then, any change that I'm making will be much faster. And the good news is that I don't have to bite the costs all at once. I can write unit-tests while I'm working on different parts of the compiler. So far I have unit-tests for basic nodes logic, basic type logic, type conversion logic and Feather types. Right now I'm working on refactoring the SparrowFrontend nodes and unit-test the generics nodes.

    May unit-tests keep you on the right path!
    Lucian Radu Teodorescu
    For some time now I've started to change the type system of Sparrow. I wanted to add immutability as a first-choice instead of the current "everything is mutable". For more details see:

    The problem is that I'm moving too slow. It takes me a long time to make changes that can be seen as minor. I make one change here which seems good, and immediately I find errors in some other part. I fix that error, and a new error appear. It feels like going in circles. Actually, it I step back a bit, I realize that:
    • I don't understand that code that I've written some time ago
    • the number of cases that I need to consider exceeds by a large margin the number of things that I can hold in my head at a given time
    • the code related to typing rules is too fragile
    • it seems that the code contains a lot of "this will fix my immediate case" logic

    That is the essential complexity that Brooks talks about, and sadly, that is a poor written code.

    Ok, so what do I do now? I cannot let this happen forever. Here are some thoughts on addressing this:
    • the way to move forward on the immediate term is to move in small increments, and for each increment, ensure that nothing is broken; making a big rewrite is going to take forever
    • after each small step, ensure that we have proper unit-test, so that we can properly assess problems
    • with each functionality change, also try a small refactoring to simplify the logic
    • think of long term redesigns that need to be addressed for a state-of-the-art implementation

    That's it for now... going back to fighting complexity. Wish me luck.
    Lucian Radu Teodorescu
    Dear all,

    I am a strong believer in the value generated by unit-tests. We programmers are not as great as we would like to be. We introduce subtle bugs everywhere (if not obvious ones), we make a lot of copy-paste errors, we make a lot of bad-replace errors, and we claim to understand the code much more than we actually do. Not to mention the fact that our memory is not as good as we think it is; when we don't make mistakes, we simply don't fully understand the code, even our code. Our mind is not that powerful to keep the ever-more complex code structures in our head.

    You can see this even from the development of Sparrow. While trying to speed things up, I constantly ignored the testing aspect of development. And, I've often found that I've spent more time investigating subtle bugs than actually developing Sparrow features. That means, I've spent more time because of the lack of testing than I would have spent on writing unit-tests.

    Another challenge that delayed my adoption of unit-tests was the complexity of the testing problem: it's hard to create good unit-tests for large abstract syntax trees, where small changes can have ripple effects. Simple unit-test are simply a waste of time. I need more evolved unit-testing methods. That's why I'm using property based testing. This is a method of testing inspired by QuickCheck library from Haskell, and it's based on checking the postconditions of the tested code is run with randomly-generated inputs. I'm using RapidCheck C++ library as my testing framework.

    Just to exemplify the power of property based testing: During some internal training we made at my company, I've asked two groups of good software engineers "to write unit-tests for a stable sort algorithm" (in two consecutive years). Then, I've made an implementation of stable sort that contained 5 intentional bugs. Most of the unit tests that I received didn't catch any of these bugs, and no unit-test submission would catch more than 2 bugs. Some bugs were simply not caught by the sum of all unit-tests received. Then, in 15-30 minutes we were able to code a set of unit-tests that would catch all these bugs, and moreover it would (almost guarantee) that there are no bugs.

    Of course, not all the problems are as simple to test as stable-sorting.

    I know that the Sparrow code is particularly not easy to unit-test, but I know that we can easily increase the confidence of the code to a much higher degree. I've already found some bugs, and clarified some of the properties of my code.

    So that's what I've been doing for the last month or so. I simply realized that I cannot change the complete type system without some basic form of validation that I'm doing the right thing. To a bug-free code! LucTeo