Tuesday, September 30, 2014

Should we avoid string?

No overview of memory issues in C# and Unity would be complete without mentioning strings. From a memory standpoint, strings are strange because they are both heap-allocated and immutable. When you concatenate two strings (be they variables or string-constants) as in:
void Update()
    string string1 = "Two";
    string string2 = "One" + string1 + "Three";
... the runtime has to allocate at least one new string object that contains the result. In String.Concat() this is done efficiently via an external method called FastAllocateString(), but there is no way of getting around the heap allocation (40 Bytes on my system in the example above). If you need to modify or concatenate strings at runtime, use System.Text.StringBuilder.

Should you avoid closures and LINQ?

You probably know that C# offers anonymous methods and lambda expressions (which are almost but not quite identical to each other). You can create them with the delegate keyword and the => operator, respectively. They are often a handy tool, and they are hard to avoid if you want to use certain library functions (such as List<T>.Sort()) or LINQ.
Do anonymous methods and lambdas cause memory leaks? The answer is: it depends. The C# compiler actually has two very different ways of handling them. To understand the difference, consider the following small chunk of code:
int result = 0;
void Update()
    for (int i = 0; i < 100; i++)
        System.Func<int, int> myFunc = (p) => p * p;
        result += myFunc(i);
As you can see, the snippet seems to create a delegate myFunc 100 times each frame, using it each time to perform a calculation. But Mono only allocates heap memory the first time the Update() method is called (52 Bytes on my system), and doesn't do any further heap allocations in subsequent frames. What's going on? Using a code reflector (as I'll explain in the next blog post), one can see that the C# compiler simply replaces myFunc by a static field of type System.Func<intint> in the class that contains Update(). This field gets a name that is weird but also revealing: f__am$cache1 (it may differ somewhat on you system). In other words, the delegator is allocated only once and then cached.
Now let's make a minor change to the definition of the delegate:
        System.Func<int, int> myFunc = (p) => p * i++;
By substituting 'i++' for 'p', we've turned something that could be called a 'locally defined function' into a true closure. Closures are a pillar of functional programming. They tie functions to data - more precisely, to non-local variables that were defined outside of the function. In the case of myFunc, 'p' is a local variable but 'i' is non-local, as it belongs to the scope of the Update() method. The C# compiler now has to convert myFunc into something that can access, and even modify, non-local variables. It achieves this by declaring (behind the scenes) an entirely new class that represents the reference environment in whichmyFunc was created. An object of this class is allocated each time we pass through the for-loop, and we suddenly have a huge memory leak (2.6 Kb per frame on my computer).
Of course, the chief reason why closures and other language features where introduced in C# 3.0 is LINQ. If closures can lead to memory leaks, is it safe to use LINQ in your game? I may be the wrong person to ask, as I have always avoided LINQ like the plague. Parts of LINQ apparently will not work on operating systems that don't support just-in-time compilation, such as iOS. But from a memory aspect, LINQ is bad news anyway. An incredibly basic expression like the following:
int[] array = { 1, 2, 3, 6, 7, 8 };

void Update()
    IEnumerable<int> elements = from element in array
                    orderby element descending
                    where element > 2
                    select element;
... allocates 68 Bytes on my system in every frame (28 via Enumerable.OrderByDescending() and 40 viaEnumerable.Where())! The culprit here isn't even closures but extension methods to IEnumerable: LINQ has to create intermediary arrays to arrive at the final result, and doesn't have a system in place for recycling them afterwards. That said, I am not an expert on LINQ and I do not know if there are components of it that can be used safely within a real-time environment.

Should you avoid foreach loops?

A common suggestion which I've come across many times in the Unity forums and elsewhere is to avoidforeach loops and use for or while loops instead. The reasoning behind this seems sound at first sight. Foreach is really just syntactic sugar, because the compiler will preprocess code such as this:
foreach (SomeType s in someList)
...into something like the the following:
using (SomeType.Enumerator enumerator = this.someList.GetEnumerator())
    while (enumerator.MoveNext())
        SomeType s = (SomeType)enumerator.Current;
In other words, each use of foreach creates an enumerator object - an instance of theSystem.Collections.IEnumerator interface - behind the scenes. But does it create this object on the stack or on the heap? That turns out to be an excellent question, because both are actually possible! Most importantly, almost all of the collection types in the System.Collections.Generic namespace (List<T>, Dictionary<K, V>, LinkedList<T>, etc.) are smart enough to return a struct from from their implementation of GetEnumerator()). This includes the version of the collections that ships with Mono 2.6.5 (as used by Unity).
[EDIT] Matthew Hanlon pointed my attention to an unfortunate (yet also very interesting) discrepancy between Microsoft's current C# compiler and the older Mono/C# compiler that Unity uses 'under the hood' to compile your scripts on-the-fly. You probably know that you can use Microsoft Visual Studio to develop and even compile Unity/Mono compatible code. You just drop the respective assembly into the 'Assets' folder. All code is then executed in a Unity/Mono runtime environment. However, results can still differ depending on who compiled the code! foreach loops are just such a case, as I've only now figured out. While both compilers recognize whether a collection's GetEnumerator() returns a struct or a class, the Mono/C# has a bug which 'boxes' (see below, on boxing) a struct-enumerator to create a reference type.
So should you avoid foreach loops?
  • Don't use them in C# code that you allow Unity to compile for you.
  • Do use them to iterate over the standard generic collections (List<T> etc.) in C# code that you compile yourself with a recent compiler. Visual Studio as well as the free .NET Framework SDK are fine, and I assume (but haven't verified) that the one that comes with the latest versions of Mono and MonoDevelop is fine as well.
What about foreach-loops to iterate over other kinds of collections when you use an external compiler? Unfortunately, there's is no general answer. Use the techniques discussed in the second blog post to find out for yourself which collections are safe for foreach[/EDIT]