C# String
A string in C# is an object of type String whose value is text. The string object contains an array of Char objects internally. A string has a Length property that shows the number of Char objects it contains.
String is a reference type that looks like value type (for example, the equality operators == and != are overloaded to compare on value, not on reference).
In C#, you can refer to a string both as String
and string
. You can use whichever naming convention suits you. The string keyword is just an alias for the .NET Framework’s String.
Contents
- String Interning
- String Immutability
- Format Strings
- Escape Sequences and Verbatim Strings
- Searching For Strings
- Further Reading
String Interning
In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable.
Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned.
The distinct values are stored in a string intern pool.
String interning is supported by some modern programming languages, including .NET languages.
When creating two identical string literals in one compilation unit, the compiler ensures that only one string object is created by the CLR. This is string interning, which is done only at compile time. Like in the above example.
Doing it at runtime would incur too much of a performance penalty. Searching through all strings every time you create a new one is too costly. Below are some example where string interning does not work and creates new object in the heap.
Note: The string object contains an array of Char objects internally. A string has a Length property that shows the number of Char objects it contains. You can also use a foreach loop or indexer with a string and access its individual Char objects.
The static String.Copy(string ...)
method creates a new instance of a string by copying an existing instance.
Let’s run the program and see the result,
Here, ReferenceEquals method determines whether the specified System.Object
instances (String in our case) are the same instance.
Thus, ReferenceEquals(s1, s3) and ReferenceEquals(s1, s4) will be false in our case because, s1, s3, and s4 are three different objects in the heap.
However, ReferenceEquals(s1,s2) will be true because, s1 and s2 have reference to same object.
Now, the below condition will be true in our case because, String is a reference type that looks like value type. Here, the equality operators == and !=, for String type, are overloaded to compare on value and not on reference.
String Immutability
One of the special characteristics of a string is that it is immutable, so it cannot be changed after it has been created. Every change to a string will create a new string. This is why all of the String manipulation methods return a string.
Immutability is useful in many scenarios. Reasoning about a structure if you know it will never change is easier. It cannot be modified so it is inherently thread-safe.
It is more secure because no one can mess with it. Suddenly something like creating undo-redo is much easier, your data structure is immutable and you maintain only snapshots of your state.
But immutable data structures also have a negative side. Let’s see how,
The above looks innocent, but it will create a new string for each iteration in your loop. It uses a lot of unnecessary memory and shows why you have to use caution when working with strings.
This code will run 10,000 times, and each time it will create a new string. The reference s will point only to the last item, so all other strings are immediately ready for garbage collection.
When working with such a large number of string operations, you have to keep in mind that string is immutable and that the .NET Framework offers some special helper classes when dealing with strings. Let’s see one such class
StringBuilder
The StringBuilder class can be used when you are working with strings in a tight loop. Instead of creating a new string over and over again, you can use the StringBuilder, which uses a string buffer internally to improve performance.
The StringBuilder class even enables you to change the value of individual characters inside a string.
Our previous example of concatenating a string 10,000 times can be rewritten with a StringBuilder,
One thing to keep in mind is that the StringBuilder does not always give better performance.
Note: When concatenating a fixed series of strings, the compiler can optimize this and combine individual concatenation operations into a single operation. When you are working with an arbitrary number of strings, such as in the loop example, a StringBuilder is a better choice (in this example, you could have also used
new String('x', 10000)
to create the string; when dealing with more varied data, this won’t be possible).
Format Strings
The String.Format
method allows a wide range of formatting options for string data.
The first parameter of this method can be passed a string that may look similar to the following:
Here, ‘The value of i is ‘ will be displayed as is, with no changes. The interesting part of this string is the section enclosed in braces. This section has the following form:
The section can contain the following three parts:
-
index: A number identifying the zero-based position of the section’s data in the args parameter array. The data is to be formatted accordingly and substituted for this section. This number is required.
-
alignment: The number of spaces to insert before or after this data. A negative number indicates left justification (spaces are added to the right of the data), and a positive number indicates right justification (spaces are added to the left of the data). This number is optional.
-
formatString: A string indicating the type of formatting to perform on this data. This section is where most of the formatting information usually resides. Tables Table 2-2 and Table 2-3 contain valid formatting codes that can be used here. This part is optional.
In our case, {0,5:G}
, 0 is our index, 5 is the number of spaces to be added before value and G represents the standard numeric format which displays the value in its shortest form.
You can get a complete list of the standard format specifiers here.
String Interpolation
String interpolation available in C# 6.0 and later, interpolated strings are identified by the $ special character and include interpolated expressions in braces.
String interpolation achieves the same results as the String.Format method, but improves ease of use and inline clarity.
Use string interpolation to improve the readability and maintainability of your code.
Instead of passing i as second parameter we can directly use variable i within the string (in place of index)
ToString
In addition to the String.Format and the Console.WriteLine methods, the overloaded ToString instance method of a value type may also use the previous formatting characters.
Using ToString, the code would look like this:
Here the overloaded ToString method accepts a single parameter of type IFormatProvider. The IFormatProvider provided for the f.ToString method is a string containing the formatting for the value type plus any extra text that needs to be supplied.
Overriding ToString
For user defined types we can override the ToString method. System.Object
has a virtual ToString
method. Object class supports all classes in the .NET class hierarchy and provides low-level services to derived classes. System.Object
is the ultimate base class of all .NET classes; it is the root of the type hierarchy.
Overriding Object’s ToString method is a good practice. If you don’t do this, ToString will return by default the name of your type. When you override ToString, you can give it a more meaningful value, as below code shows.
You can also implement this custom formatting on your own types. You do this by creating a ToString(string) method on your type. When doing this, make sure that you are compliant with the standard format strings in the .NET Framework.
For example, a format string of G should represent a common format for your object (the same as calling ToString()) and a null value for the format string should also display the common format.
Here is an example,
Escape Sequences and Verbatim Strings
When declaring a string variable, certain characters can’t, for various reasons, be included in the usual way. C# supports two different solutions to this problem.
The first approach is to use ‘escape sequences’. For example, suppose that we want to set variable a to the value:
We could declare this using the following command, which contains escape sequences for the quotation marks and the line break.
The following table gives a list of the escape sequences for the characters that can be escaped in this way:
Character | Escape Sequence |
' | \' |
" | \" |
\ | \ |
Alert | \a |
Backspace | \b |
Form feed | \f |
New Line | \n |
Carriage Return | \r |
Horizontal Tab | \t |
Vertical Tab | \v |
A unicode character specified by its number e.g. \u200 | \u |
A unicode character specified by its hexidecimal code e.g. \xc8 | \x |
null | \0 (zero) |
The second approach is to use ‘verbatim string’ literals. These are defined by enclosing the required string in the characters @” and “. To illustrate this, to set the variable ‘path’ to the following value:
we could either escape the back-slash characters
or use a verbatim string thus:
Usefully, strings written using the verbatim string syntax can span multiple lines, and whitespace is preserved. The only character that needs escaping is the double-quote character, the escape sequence for which is two double-quotes together. For instance, suppose that you want to set the variable ‘text’ to the following value:
Using the verbatim string syntax, the command would look like this:
Searching For Strings
When working with strings, you often look for a substring inside another string (to parse some content or to check for valid user input or some other scenario).
The String class offers a couple of methods that can help you perform all kinds of search actions. The most common are IndexOf, LastIndexOf, StartsWith, EndsWith, and SubString.
IndexOf returns the index of the first occurrence of a character or substring within a string. If the value cannot be found, it returns -1. The same is true with LastIndexOf, except this method begins searching at the end of a string and moves to the beginning.
StartsWith and EndsWith see whether a string starts or ends with a certain value, respectively. It returns true or false depending on the result.
Substring can be used to retrieve a partial string from another string. You can pass a start and a length to Substring. If necessary, you can calculate these indexes by using IndexOf or LastIndexOf.
Use the Replace method to replace all occurrences of a specified substring with a new string. Like the Substring method, Replace actually returns a new string and does not modify the original string.
Further Reading
-
What is string interpolation? by Iris Classon - In “(Not so) Stupid Question” section of her blog Iris talks about string interpolation and explains its internal working.
-
string vs. String is not a style debate by Jared Parsons - Jared Parson who works on the C# compiler notes that String vs string is not a style debate. He makes a compelling case for why you should always use string.
-
C# String.Format() and StringBuilder - This article discusses whether String.Format is as efficient as StringBuilder in .NET.
Subscribe to Code with Shadman
Get the latest posts delivered right to your inbox