Pure Functions C#

As software becomes more and more complex, it is more and more important to structure it well. Well-structured software is easy to write and to debug and provides a collection of modules that can be reused to reduce future programming costs. This all can be achieved through functional programming.

In my series of post on functional programming I will try to demonstrate the significance of functional programming, that will help C# programmers to exploit its advantages.

Contents

Is C# Functional Language?

We tend to see the introduction of more functional features with every new release of C# language, enabling a multiparadigm programming style in the language, for example:

  • C# 2.0 also allows programmers to pass and returns functions as values for higher-order functions using delegates, and has limited support for anonymous delegates.

  • C# 3.0 and 3.5 improved support anonymous functions for true closures.

  • LINQ can be considered C#’s own flavor of Haskell’s list comprehensions; Haskell is a statically typed, purely functional programming language with type inference and lazy evaluation; which somewhat LINQ and C# type inference represent.

  • Anonymous types look like an approximation of ML records. ML is a general-purpose functional programming language.

I wouldn’t necessarily consider some of those features mentioned above as exclusive to functional programming languages, but it’s pretty clear that the C# developers have taken a lot of inspiration from functional programming languages in the past few years.

For a more in-depth explanation see Is C#7 starting to look like a functional language?.

However, the adoption of functional programming in the C# community has been slow.

This adoption is slow as most of the books and articles explain functional techniques and concepts with examples from the domains of mathematics or computer science. However, most programmers work on business applications and this creates a domain gap.

However,

Let’s see why Functional Programming matters and you should try to adopt it,

Benefits Of Functional Programming

  • Functional programming (FP) is more declarative in style.

  • We know for sure what a piece of code does and doesn’t do, thus we can change our code with more confidence.

  • The other major benefit is concurrency, which is easier to do with functional programming because a program written in the imperative style may work well in a single-threaded implementation but cause all sorts of bugs, like race condition, when concurrency comes in. Functional code offers much better guarantees in concurrent scenarios.

  • There is no hidden state (private fields or static fields) or any kind of dependency on the outside world. This fact alone is going to make testing much easier because we don’t have to worry about Mocking dependency or other hacks to prepare the functionality for the test.

  • In Functional Programming, by looking at functions we immediately receive a lot of information about what a function does just by looking at its signature. Sometimes that is all we need to know about a function and that’s far more clear and faster than the behavior methods in Object-oriented style.

To understand these benefits we must first understand what are “FP Functions” / “Pure Functions” in functional programming and why they matter?

What Are FP Functions / Pure Functions?

A Pure Function Is A Map Between Two Sets

In mathematics, and also in functional programming, a function is a map between two sets, respectively called the domain and codomain. That is, given an element from its domain, a function yields an element from its codomain.

1 public static class Math
2 {
3     public static int Add(int a, int b)
4     {
5         return a+b;
6     }
7 }

In the above example our domain and codomain both are Integers i.e. they are range from -2,147,483,648 to 2,147,483,647. This is easy to write as the validation is already done by the language itself but this is not always the case when you write business logic. Let’s see a different kind of example:

1 public static string GetDomains(string email)
2 {
3     if (email is null)
4     {
5         throw new ArgumentNullException(nameof(email));
6     }
7 
8     return email.Split('@').LastOrDefault();
9 }

In the above example the method has two possible outcomes actually three it can return a string, a null object, or throws an exception. And this will be the case for every function that operates on email. One way to improve this program is creating a data object that represents our domain. Let’s see how

 1 using System;
 2 using System.Text.RegularExpressions;
 3 
 4 namespace Example1
 5 {
 6     public class Email
 7     {
 8         public string Value { get; }
 9         public Email(string email)
10         {
11             if (email is null)
12             {
13                 throw new ArgumentNullException(nameof(email));
14             }
15 
16             if (!IsValidEmailAddress(email))
17             {
18                 throw new ArgumentException(nameof(email));
19             }
20 
21             Value = email;
22         }
23 
24         public static bool IsValidEmailAddress(string email)
25         {
26             var regex = new Regex(@"[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?");
27             return regex.IsMatch(email);
28         }
29     }
30 }

In this implementation, Email still uses a string in its underlying representation, but the constructor ensures that Email can only be instantiated with a valid value.

The Email type is being created precisely to represent the domain of the GetDomains function, which can now be rewritten as follows:

1 public static string GetDomains(Email email) => email.Value.Split('@').LastOrDefault();

This new implementation has several advantages. You’re guaranteeing that only valid values can be given; GetDomains no longer causes runtime errors; and the concern of validating the email value is captured in the constructor of the Email type, removing the need for duplicating validation wherever an Email is processed.

The Output Of A Pure Function Is Determined Exclusively By Its Input

The value that a function yield is determined exclusively by its input. You’ll see that this isn’t always the case with functions in programming. Let’s see why

 1 using System;
 2 
 3 namespace Common
 4 {
 5     public class Person
 6     {
 7         public string Name { get; set; }
 8         public string Email { get; set; }
 9 
10         public DateTime DateOfBirth { get; set; }
11 
12         public bool IsMinor()
13         {
14             var timespan = DateTime.Now - DateOfBirth;
15             var age = (int)timespan.TotalDays / 365;
16             return age < 18; 
17         }
18 
19         public override string ToString()
20         {
21             return $"Name:{Name}";
22         }
23     }
24 }

Is the IsMinor method pure? The result of IsMinor will depend on the current date, so, clearly, the answer is no! What kind of side effect are we facing here? It’s I/O: DateTime.Now queries the system clock, which is not in the context of the program.

Functions that perform I/O are difficult to test. For example, the following test passes as I’m writing this, but it will start to fail on the 16th of December, 2019:

1 var john = new Person()
2 {
3     Name = "John Doe",
4     Email = "john@yahoo.com",
5     DateOfBirth = DateTime.ParseExact("12/15/2001",
6     "MM/dd/yyyy", CultureInfo.CurrentCulture)
7 };
8 
9 Assert.AreEqual(true, john.IsMinor());

You can address this issue in the following manner:

1 public bool IsMinor(DateTime on)
2 {
3     var timespan = on - DateOfBirth;
4     var age = (int)timespan.TotalDays / 365.2425; 
5     return age < 18; 
6 }

Now the implementation of IsMinor is pure (because today is not mutable). You’ve effectively pushed the side effect of reading the current date outwards.

However, still, our method is dependent on a hidden field. The programming constructs we use to represent functions all have access to a “context”: an instance method has access to instance fields.

As we discussed Pure functions closely resemble mathematical functions: they do nothing other than computing an output value based on their input values. They never mutate global state—“Global” here means any state that’s visible outside of the function’s scope.

A private instance field is considered global because it’s visible from all methods within the class.

So we can even further refactor our code to attend true purity.

1 public static class DateTimeExtension
2 {
3     public static bool IsMinor(this DateTime dob, DateTime on)
4     {
5         var timespan = on - dob;
6         var age = (int)timespan.TotalDays / 365.2425;
7         return age < 18;
8     }
9 }

Now we can write the test case as follows:

 1 var john = new Person()
 2 {
 3     Name = "John Doe",
 4     Email = "john@yahoo.com",
 5     DateOfBirth = DateTime.ParseExact("12/15/2001",
 6     "MM/dd/yyyy", CultureInfo.CurrentCulture)
 7 };
 8 
 9 var on = DateTime.ParseExact("12/14/2019",
10     "MM/dd/yyyy", CultureInfo.CurrentCulture);
11 
12 Assert.AreEqual(true, john.DateOfBirth.IsMinor(on));

And client can use the code in following manner:

1 foreach (var person in people.Where(p=>p.DateOfBirth.IsMinor(DateTime.Now)))
2 {
3     Console.WriteLine(person);
4 }

Domain And Codomain Constitute A Function’s Interface

The types for the domain and codomain constitute a function’s interface, also called its type or signature. You can think of this as a contract: a function signature declares that, given an element from the domain, it will yield an element from the codomain.

In the above example, we defined this using return type of method and type for parameters

1 public static int Add(int a, int b)

Here both our domain and codomain are defined as int type.

1 public static string GetDomains(Email email) => email.Value.Split('@').LastOrDefault();

Here our domain is define by a custom type Email and its codamain is a .NET build in type string.

1 public static bool IsMinor(this DateTime dob, DateTime on)

Here our domain represents .NET DateTime type while codomain will always be a boolean value true or false.

How You Can Define Functions In C#

There are several language constructs in C# that you can use to represent functions:

Methods

Methods are the most common and idiomatic representation for functions in C#. For example, the System.Math class includes methods representing many common mathematical functions.

Methods can represent functions, but they also fit into the object-oriented paradigm—they can be used to implement interfaces, they can be overloaded, and so on.

The constructs that really enable you to program in a functional style are delegates and lambda expressions.

Delegates

Delegates are type-safe function pointers. Type-safe here means that a delegate is strongly typed: the types of the input and output values of the function are known at compile-time, and consistency is enforced by the compiler.

The Func And Action Delegates

The .NET framework includes a couple of delegate “families” that can represent pretty much any function type:

  • Func<R> represents a function that takes no arguments and returns a result of type R.
  • Func<T1, R> represents a function that takes an argument of type T1 and returns a result of type R.
  • Func<T1, T2, R> represents a function that takes a T1 and a T2 and returns an R.

And so on.

Since the introduction of Func, it has become rare to use custom delegates. For example, instead

of declaring a custom delegate like this,

1 delegate Greeting Greeter(Person p);

you can just use the type:

1 Func<Person, Greeting>

The type of Greeter in the preceding example is equivalent to, or “compatible with,” Func<Person,Greeting>. In both cases, it’s a function that takes a Person and returns a Greeting.

There’s a similar delegate family to represent actions—functions that have no return value, such as void methods:

  • Action represents an action with no input arguments.
  • Action<T1> represents an action with an input argument of type T1.
  • Action<T1, T2> and so on represent an action with several input arguments.

The evolution of .NET has been away from custom delegates, in favor of the more general Func and Action delegates.

Lambda Expressions

Lambda expressions, called lambdas for short, are used to declare a function inline. For example, sorting a list of numbers alphabetically can be done with a lambda like so.

1 var list = Enumerable.Range(1, 10).Select(i => i * 3).ToList();
2 list // => [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]
3 list.Sort((l, r) => l.ToString().CompareTo(r.ToString()));
4 list // => [12, 15, 18, 21, 24, 27, 3, 30, 6, 9]

If your function is short and you don’t need to reuse it elsewhere, lambdas offer the most attractive notation. Also notice that in the preceding example, the compiler not only infers the types of x and y to be int, it also converts the lambda to the delegate type Comparison<int> expected by the Sort method, given that the provided lambda is compatible with this type.

Dictionaries

Dictionaries are fittingly also called maps (or hashtables); they’re data structures that provide a very direct representation of a function. They literally contain the association of keys (elements from the domain) to values (the corresponding elements from the codomain).

We normally think of dictionaries as data, so it’s enriching to change perspectives for a moment and consider them as functions. Dictionaries are appropriate for representing functions that are completely arbitrary, where the mappings can’t be computed but must be stored exhaustively.

Why Function Purity Matters?

The deterministic nature of Pure functions / FP functions (that is, the fact that they always return the same output for the same input) has some interesting consequences.

Pure functions are easy to test and to reason about. Furthermore, the fact that outputs only depend on inputs means that the order of evaluation isn’t important. Whether you evaluate the result of a function now or later, the result will not change.

This means that the parts of your program that consist entirely of pure functions can be optimized in a number of ways using techniques such as Parallelization, Lazy evaluation, Memoization. Using these techniques with impure functions can lead to nasty bugs. For these reasons, FP advocates that pure functions should be preferred whenever possible.

Let’s see what happens if we naively apply parallelization with the impure function:

Let’s say you want to format a list of strings, like below one, as a numbered list; and you want to use the parallel features of .NET along with it.

 1 var names = new List<string>
 2             {
 3                 "Lorena Villagomez",
 4                 "Jennine Beaty",
 5                 "Paulina Vannatter",
 6                 "Sherell Hoots",
 7                 "Rosalee Fleurant",
 8                 "Ervin Hamel",
 9                 "Ulysses Adkisson",
10                 "Tawanna Winward",
11                 "Brigette Masterson",
12                 "Joelle Ranieri",
13                 ...
14             };
1) Lorena Villagomez
2) Jennine Beaty
3) Paulina Vannatter
4) Sherell Hoots
5) Rosalee Fleurant
6) Ervin Hamel
7) Ulysses Adkisson
8) Tawanna Winward
9) Brigette Masterson
10) Joelle Ranier
...

To do this, you’ll write a program something like this,

1 var count = 0;
2 var list = names.AsParallel().Select(item => 
3 {
4     count++;
5     return $"{count}) {item}";
6 });
7 WriteLine(String.Join("\n", list));

Here AsParallel() enables parallelization of a query. This method binds the query to PLINQ. PLINQ can achieve significant performance improvements over legacy code for certain kinds of queries, often just by adding the AsParallel query operation to the data source. For more information, see Parallel LINQ (PLINQ).

Now,

Because the line no 4 in the above program increments the counter variable, and the parallel version will have multiple threads reading and updating the counter. As is well known, ++ is not an atomic operation, and because there’s no locking in place, we’ll lose some of the updates and end up with an incorrect result.

If you test this approach with a large enough input list, you’ll get a result like this:

1) Lorena Villagomez
2) Jennine Beaty
3) Paulina Vannatter
4) Sherell Hoots
5) Rosalee Fleurant
6) Ervin Hamel
7) Ulysses Adkisson
8) Tawanna Winward
9) Brigette Masterson
10) Joelle Ranieri
11) Min Turlington
12) Ruthanne Trueblood
13) Adele Feather
39) Melva Piedra
14) Rema Carpino
27) Kareen Mcalexander
40) Joesph Rabun
15) Takako Danz
...

You can see that the numbering is improper after the 13th item.

This will look pretty familiar if you have some multithreading experience. Because multiple processes are reading and writing to the counter at the same time, some of the updates are lost.

Now, the thing is

Here, our select method violates the pure function rule: “The output of a pure function is determined exclusively by its input”.

var list = names.AsParallel().Select(item => 
{
    count++;
    return $"{count}) {item}";
});

As our select lambda expression depends on an external closed-over variable count it makes our lambda function impure.

You probably know that this could be fixed by using a lock or the Interlocked class when incrementing the counter. But locking is an imperative construct that we’d rather avoid when coding functionally.

One possible way to avoid the pitfalls of concurrent updates is to remove the problem at the source: don’t use the shared state, to begin with. How this can be done will vary with each scenario,

but I’ll show you a solution for the current scenario that will enable us to format the list in parallel.

What if instead of updating a running counter, you generate a list of all the counter values you need, and then pair items from the given list with items from the list of counters?

For the list of integers, you can use Range, a convenience method on Enumerable. And, the operation of pairing two parallel lists is a common operation in FP, and it’s called Zip.

Using Range and Zip, you can rewrite the list as follows.

1 var linenos = Range(1, names.Count());
2 var list = linenos.AsParallel()
3     .Zip(names.AsParallel(), (lineno, item) => $"{lineno}) {item}");
4 WriteLine(String.Join("\n", list));

In this case, ParallelEnumerable does all the heavy lifting, and you can easily resolve the problem by reducing this specific scenario to the more common scenario of zipping two parallel sequences—a scenario common enough that it’s addressed in the framework.

In this scenario, it was possible to enable parallel execution by removing state updates altogether, but this isn’t always the case, nor is it always this easy. But the ideas you’ve seen so far already put you in a better position when tackling issues related to parallelism, and more generally concurrency.

That’s all for now, join me in the next article by subscribing to my newsletter using the form below.

Further Reading

Books

  • Functional Programming in C#: How to write better C# code - Functional Programming in C# teaches you to apply functional thinking to real-world problems using the C# language. The book, with its many practical examples, is written for proficient C# programmers with no prior FP experience. It will give you an awesome new perspective.

  • Real-World Functional Programming: With Examples in F# and C# - Real-World Functional Programming is a unique tutorial that explores thefunctional programming model through the F# and C# languages. The clearlypresented ideas and examples teach readers how functional programming differsfrom other approaches. It explains how ideas look in F#-a functionallanguage-as well as how they can be successfully used to solve programmingproblems in C#. Readers build on what they know about .NET and learn wherea functional approach makes the most sense and how to apply it effectively inthose cases.

Pure Functions C#
Share this

Subscribe to Code with Shadman