PChars: no strings attached

The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. — Alan Perlis

In the public Delphi newsgroups on the Embarcadero server, or in the Delphi tags on StackOverflow, I often see that there is still great confusion about the PChar type on one, and the string type on the other hand. In this article I would like to discuss the similarites and the differences between both types, as well as some things you should or shouldn’t do with them.

The general principles layed out in this article apply to all Win32, Win64 and OS X versions of Delphi, including Delphi 2009 and up. There is, however, a special “chapter” at the end of this article especially for those who use Delphi 2009 and up.

PChar

Trying to outsmart a compiler defeats much of the purpose of using one. — Kernighan and Plauger, The Elements of Programming Style.

PChars were inspired by strings, as used in the C language. Most Windows API functions have a C interface, and accept C style strings. To be able to use APIs, Borland had to introduce a type that mimicked them, in the ancestor of Delphi, Turbo Pascal.

In C, there is no real string type, not like there is in Delphi. Strings are just arrays of characters, and the end of the text is marked by a character with ASCII code zero. This allows them to be very long (unlike Turbo Pascal’s string type, which was limited to 255 characters and a length byte – this is Delphi’s ShortString type now), but a bit awkward to use. The beginning of the array is simply marked by a char *, which is a pointer to a char. The exact Delphi equivalent is ^Char. This has become the type PChar in Turbo Pascal and Delphi.

To traverse a string in C, you can increment or decrement the pointer using code like p++ or --s, or use the pointer as if it were an array — this is true for all pointers in C — and use s[20] to indicate the 21st character — counting starts at 0. But C pointer arithmetic not only allows incrementing and decrementing the pointer, it also allows calculating the sum of a pointer and a number, or the difference between two pointers. In C, *(s + 20) is equivalent to s[20] (* is the C pointer operator, much like Delphi’s ^). Borland introduced almost the same syntax for the PChar type in Turbo Pascal, if the {$X+} (extended syntax) directive was set.

In Delphi version 2009 and up, pointer arithmetic (or pointer math, as the Delphi developers called it) is supported for all pointer types, if the directive {$POINTERMATH ON} is used.

Just a pointer

Despite the slighlty extended syntax and handling described further on, never forget that a PChar is just a pointer, like in C. And also like in C, you can use it as if it were an array (i.e. the pointer points to the first character in the array). But it isn’t! A PChar has no automatic storage, like the convenient Delphi type string. If you copy text to a PChar-“string”, you must always make sure that the PChar actually points to a valid array, and that the array is large enough to hold the text.

Like in C, a PChar variable merely points to a Char. Usually, as in C, this Char is part of an array of Char that ends in a Char with ordinal value 0 and such an array is often used to pass text around between functions, but there is no guarantee that the character is part of a larger array, and there is no guarantee that there is a 0 at the end. This is only a convention.

And like with any other pointers, you can make mistakes with them.

1
2
3
4
5
var
  S: PChar;
begin
  S[0] := 'D';
  S[1] := '6';

The code above did not allocate storage for the string, so it tries to store the characters starting at some undefined location in memory (the address that is formed by the bit pattern that P happens to hold before it is assigned the address of determined memory location is undefined, see my article on pointers). This can cause problems, like memory corruption and even lead to a program crash, or — worse — wrong results. It is your responsibility to ensure that the array exists. The easiest way is to use a local array:

1
2
3
4
5
6
7
var
  S: PChar;
  A: array[0..100] of Char;
begin
  S := A;
  S[0] := 'D'; // this is equivalent to A[0] := 'D';
  S[1] := '6'; // you could also write: (S + 1)^ := '6';

The above code stores the characters in the array. But if you try to display the string at S, it will probably display lots of nonsense. That is because the string didn’t end in a #0 character. OK, you could simply add another line:

1
  S[2] := #0; // or: (S + 2)^ := #0;

and you would get a display of the text "D6". But storing characters one by one is really inconvenient. To display a text via a PChar is much simpler: you simply set the PChar to an already existing array with a text in it. Luckily, string constants like 'Delphi' are also such arrays, and can be used with PChars:

1
2
3
4
var
  S: PChar;
begin
  S := 'Delphi';

You should however be aware that that only changes the value of the pointer S. No text is moved or copied around. The text is simply stored somewhere in the program (and has a #0 delimiter), and S is pointed to its start address. If you do:

1
2
3
4
5
6
7
// WARNING: BAD EXAMPLE
var
  S: PChar;
  A: array[0..100] of Char;
begin
  S := A;
  S := 'Delphi';

this does not copy the text 'Delphi' to the array A. Line 6 points S to the array A, but immediately after that, the next line only changes S (a pointer!) to the address of the literal string. If you want to copy text to the array, you must do that using, for instance, StrCopy or StrLCopy:

1
2
3
4
5
6
var
  S: PChar;
  A: array[0..100] of Char;
begin
  S := A;
  StrCopy(S, 'Delphi');

or

6
  StrLCopy(S, 'Delphi', Length(A) - 1);

In this simple case it is obvious that 'Delphi' will generously fit in the array, so the use of StrLCopy seems a bit overdone, but in other occasions, where you don’t know the size of the string, you should use StrLCopy to avoid overrunning the array bounds.

A static array like A is useful as a text buffer for small strings of a known maximum size, but often you’ll have strings of a size which is unknown when the program is compiled. In that case you’ll have to use dynamic allocation of a text buffer. You can for instance use StrAlloc or StrNew to create a buffer, or GetMem, but then you’ll have to remember to free the memory again, using StrDispose or FreeMem. If you wanted to avoid low level routines, you could use a dynamic array of Char (or TArray<Char>), but that is not quite as convenient as using a Delphi string as a text buffer. But before I describe how to do that, I want to discuss that type first.

String

A world without string is chaos — Randolf Smuntz, Mouse Hunt

Allow me to confuse you: a string or, more precise, AnsiString (in Delphi 2009 and higher: UnicodeString) is in fact a PChar. Just as a PChar, it is a pointer to an array of characters, terminated by a #0 character. But there is one big difference. You normally don’t have to think about how they work. They can be used almost like any other variable. The compiler takes care that the appropriate code to allocate, copy and free the text is called. So instead of calling routines like StrCopy, the compiler takes care of such chores for you.

But there is more. Although the text is sure to be always terminated by a #0, just to make AnsiStrings compatible with C-style strings, the compiler doesn’t need it. In front of the text in memory, at a negative offset, the length of the string is stored, as an Integer. So to know the length of the string, the compiler only has to read that Integer, and not count characters until it finds a #0. That means that you can store #0 characters in the middle of the string without confusing the compiler. But some output routines, which rely on the #0 and not on the length, might be confused.

Normally, each time you’d assign one string to another variable, the compiler would have to allocate memory and copy the entire string to it. Because Delphi strings can be quite long (theoretically, up to 2GB), this could be slow. To avoid all the copying, Delphi knows a concept that is called “copy on write” (COW), meaning that, on assignment, only a copy of the pointer is made. A copy of the text is only made if the string data is about to be changed. Each string has a few fields of information stored in front of it. One is the reference count: this is the count of string variables that actually reference that particular string in memory. Only if it becomes 0, the string text is not referenced anymore, and the memory can be freed.

The compiler takes care that the reference count is always correct (but you can confuse the compiler by casting – more on that later). If a string variable is declared in a var section of a function or procedure, or as a field of a class or record, it starts its life as nil, the internal representation of the empty string (''). As soon as string text is created and assigned to one of these variables, the reference count of the string is updated to 1. Each additional assignment of that particular string to a new variable increments its reference count. If a string variable leaves its scope (when the function or class in which it was declared ends), or is pointed to a new string, the reference count of the text is decremented.

A simple example:

1
2
3
4
5
function PlayWithStrings: string;
var
  S1, S2: string;
begin
  S1 := IntToStr(123456);

Now S1 points to the text '123456' and has a reference count of 1.

6
  S2 := S1;

No text is copied yet, S2 is simply set to the same address as S1, but the reference count of the text '123456' is 2 now.

7
  S2 := 'The number is ' + S2;

Now a new, larger buffer is allocated, the text 'The number is ' is copied to it, and the text from '123456' concatenated. But, since S2 doesn’t point to the original text '123456' anymore, the reference count of that text is decremented to 1 again.

8
  Result := S2;

Result is set to point to the same address as S2, and the reference count of the text 'The number is 123456' is incremented to 2.

9
end;

Now S1 and S2 leave their scope. The reference count for '123456' is decremented to 0, and the text buffer is freed. The reference count for 'The number is 123456' is decremented too, but only to 1, since the function result still points to it. So although the function has ended, the string is still around.

What is important to notice here is that strings are more or less independent of the variables or fields that reference them. Only the number of references is important. If that is not 0, the string is still referenced somewhere and it must remain in memory. If it becomes 0, the memory for the string and its associated data (length, reference count, codepage) can be freed.

Complicated? Yes, it is complicated, and can get even more complicated with var, const and out parameters. But fortunately, you normally don’t have to worry about this. Only if you access strings in assembler, or using a typecast to a PChar, this can become important to know. But using strings with a typecast to PChar is something which is not uncommon.

The most importants things to remember about strings are

  • that text is only copied to a new string buffer if it is modified;
  • that the reference count and the length are not connected to a string variable, but to a specific text buffer (also known as payload), to which more than one string variable can point;
  • that the reference count is always correct unless you fool the compiler by casting to a different type;
  • that assignments to a variable decrement the reference count of the text buffer it previously pointed to;
  • that if the reference count becomes 0, the string buffer is freed.

Using strings and PChars together

If you can’t be a good example, then you’ll just have to be a horrible warning. — Catherine Aird

PChars and character arrays are awkward to use. Most of the time, you must allocate memory, and not forget to free it. If you want to add text, you must first calculate the size of the resulting text, reallocate the text buffer if it is too small, and use StrCat or StrLCat to finally add the text. You must use StrComp or StrLComp to compare strings, etc. etc.

Strings, on the other hand, are much simpler to use. Most things are done automatically. But many Windows (or Linux) API functions require PChars, and not strings. Fortunately, since strings are also pointers to zero-terminated text, you can use them as a PChar by simply casting them:

1
2
3
4
5
6
var
  S: string;
begin
  S := ExtractFilePath(ParamStr(0)) + 'MyDoc.doc';
  ShellExecute(0, 'open', PChar(S), nil, nil, SW_SHOW);
end;

Don’t forget that a string variable is a pointer to text, and not a text buffer itself. If the text is modified, it is often copied to a new location, and the address in the variable is adjusted accordingly. That means that you should not use a PChar to point to the string and then modify the string. It is best to avoid doing something like:

1
2
3
4
5
6
7
8
// WARNING: BAD EXAMPLE
var
  S: string;
  P: PChar;
begin
  S := ParamStr(0); // say, this returns 'C:\Test.exe';
  P := PChar(S);
  S := 'Something else';

If S is changed to 'Something else', P is not changed with it, and still points to 'C:\Test.exe'. Since P is not a string reference to that text, and there is no other string variable pointing to it, its reference count is decremented to 0, and the text is discarded. That means that P now points to invalid memory and has become a so-called dangling pointer.

In the above, I originally had a link to the Wikipedia article on dangling pointers, but that contains so much nonsense, lack of insight and bad advice for the avoidance of dangling pointers that I removed it again. If these people do not know if a pointer under their control points to valid memory or not, there is something terribly wrong with their design.

It is wise not to confuse the compiler by mixing PChar and string variables, unless you know what you do. The compiler does not recognize a PChar as a string, so it does not change the reference count of the string memory, if you reference it with a PChar. It is often better not to use a PChar variable like this at all. Simply use the string type as much as possible, and only cast to PChar at the last possible moment. Functions accepting a PChar parameter should copy the text they receive to their own buffer as soon as possible, so it doesn’t matter what happens to the original.

Normally, string buffers are only as large as necessary to contain the text assigned to them. But using SetLength you can set the string buffer to any size you need. This makes string buffers useful as text buffers to receive text. Windows API functions that return a text in a character array can be used like this:

1
2
3
4
5
6
function WindowsDirectory: string;
begin
  SetLength(Result, MAX_PATH);
  GetWindowsDirectory(PChar(Result), Length(Result));
  SetLength(Result, StrLen(PChar(Result)));
end;

Alternatively, since you can assign a PChar to a string, and that will result in a new string with a copy of the text, you can set the length of the string just as well with this functionally equivalent code:

5
  Result := PChar(Result);

The last line of the function sets the length of the string back to the length of the C-style string that was stored in the buffer. If you need the result as a PChar anyway, to be processed by further API routines, you may perhaps be tempted to do this instead:

1
2
3
4
5
6
7
8
// WARNING: BAD EXAMPLE
function WindowsDirectoryAsPChar: PChar;
var
  Buffer: array[0..MAX_PATH] of Char;
begin
  GetWindowsDirectory(Buffer, MAX_PATH);
  Result := Buffer;
end;

This will however fail. Because Buffer is a local variable, the entire buffer is in local memory (the processor stack). As soon as the function ends, the local memory is reused for other routines, so the text to which the result now points is turned into complete gibberish. Local buffers should never be used to return text.

But even if you had used a dynamic allocation with StrAlloc or a similar routine, the user would have to free the buffer. It generally is not a good idea to return PChars like that. Better follow the example of GetWindowsDirectory, and let the user of the function provide a buffer and its length. You then simply fill the buffer (using StrLCopy) up to the given length.

There is an alternative to the function WindowsDirectory, that could use a local buffer. This relies on the fact that you can assign a PChar to a string directly. To make the text a Delphi string (with length and reference count fields), a Delphi string buffer of the required length is allocated, and the text is copied to that. So even if the local buffer is discarded, the text in the string buffer is still there:

1
2
3
4
5
6
7
function WindowsDirectory: string;
var
  Buffer: array[0..MAX_PATH] of Char;
begin
  GetWindowsDirectory(Buffer, MAX_PATH);
  Result := Buffer; // StrLen(Buffer) characters copied!
end;

But how would you write a function, for instance in a DLL, that must pass back data as a PChar, yourself? I think you should take the example of GetWindowsDirectory again. Here is a simple DLL function, returning a version string that is stored in our DLL:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Having a separate function to get the length is clearer than
// asking GetDLLVersion to provide that length if parameters are nil.
function GetDLLVersionLength: Integer;
begin
  Result := Length(DLLVersion + IntToStr(VersionNum));
end;

// Returns number of characters copied, excluding zero byte
function GetDLLVersion(Buffer: PChar; MaxLen: Integer): Integer;
begin
  if (Buffer <> nil) and (MaxLen > 1) then
  begin
    StrLCopy(Buffer, PChar(DLLVersion +IntToStr(VersionNum)), MaxLen - 1);
    Result := StrLen(Buffer);
  end
  else
    Result := 0;
end;

As you can see, the string is simply copied to the provided buffer with StrLCopy. Because the user must provide the buffer, you avoid any memory management problems. If you provided it, the user would have to know how to free it. FreeMem doesn’t work across a DLL boundary. But even if it did, a user of the DLL that used C or Visual Basic would not know how to free the buffer in that language, since memory management is different in each language. Letting the user provide the buffer makes him or her independent of your implementation.

Casting

A little more on casting. Take a look at the following piece of code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
var
  A, B, C: string;
begin
  A := 'Hello';
  Writeln(NativeInt(PChar(@A[1]))); { 1 }
  B := A;
  Writeln(NativeInt(PChar(A)));     { 2 }
  C := B;
  Writeln(NativeInt(Pointer(A)));   { 3 }
end;

This results in 3 times the same number (the address of the payload of the string as integer) being displayed. But under the hood, different things happen, depending on the cast. If you are familiar with the CPU view of the IDE, you can see this for yourself.

For the line marked with { 1 }, the compiler inserts a call to _UniqueString, which creates a unique copy of the string, because accessing the single element A[1] already causes a copy on write, even though nothing is actually written. Had A been empty, accessing A[1] would probably cause a range check error (depending on the options set).

For the line marked with { 2 }, the compiler inserts a call to _UStrToPWChar (or a similar function, depending on the Delphi version), which simply passes on the address it is given, unless the string is empty. If the string is empty, _UStrToPWChar returns the address of a “string” that consists of a single #0 character.

For the line marked with { 3 }, the compiler does nothing special. The cast simply returns the address stored in A.

Delphi 2009 and up

In Delphi 2009, strings were changed big time. Before, i.e. in Delphi 2 up to Delphi 2007, string mapped to AnsiString, and each character was a single-byte AnsiChar. A PChar was in fact a PAnsiChar. But in Delphi 2009, strings were made to use 16 bit Unicode, to be precise, UTF-16, which meant that a new string type was required: UnicodeString. This string type is based on WideChars. This became the default string type, which meant that string now mapped to UnicodeString, Char to WideChar and PChar to PWideChar.

Delphi for Win32 already had the string type WideString, but this is a type that is allocated by the OS and has no reference count or “copy on write”, so each assignment meant that a new, unique, full copy of the text had to be made. The WideString type is not very performant, and that is why the new UnicodeString type was introduced.

Beside the length and the reference count field, each string type, i.e. AnsiString as well as UnicodeString, got extra fields stored before the text: a Word containing the encoding for the string (mainly used for single byte strings like AnsiString) and a Word containing the character size. The encoding of an AnsiString governs how characters with byte values 128 up to 255 are interpreted and converted, the character size is mainly necessary for interfacing with C++ code.

Additionally, a few other string types were introduced as well: RawByteString and UTF8String. UTF8Strings are meant to contain text in UTF-8 format, which means that each element is an AnsiChar, but that “characters” can be encoded as multiple AnsiChars. Note that I put “characters” in quotes, since in the context of Unicode, it is more accurate to speak of code points.

As you can see in the Wikipedia article about UTF-16, it is also possible that some UTF-16 code points also require the use of two WideChars, so called “surrogate pairs”. So the Length of a UnicodeString or an UTF8String do not necessarily correspond to the number of code points they contain. However, in UnicodeString, surrogate pairs are pretty seldom, while in UTF-

Another new string type is the RawByteString. If you assign an AnsiString with one type of encoding to a string with a different encoding, an automatic conversion will take place, which could result in a loss of data, if characters from one encoding have no equivalent in the other. AnsiStrings use a default encoding, governed by system settings. RawByteString, however, is a string without any encoding, so you can be sure that if you assign your AnsiString or UTF8String to one (usually when passing one of them as a parameter), no conversion will take place.

The Delphi 2009 help says about RawByteString:

RawByteString enables the passing of string data of any code page without doing any codepage conversions. Normally, this means that parameters of routines that process strings without regard for the string’s code page should be of type RawByteString. Declaring variables of type RawByteString should rarely, if ever, be done, because this can lead to undefined behavior and potential data loss.

So what to do?

As you can see, in the text above, I hardly make any reference to the size of a Char. So anything I wrote in the article above can also be applied in Delphi 2009 and up. Most code using the techniques mentioned simply recompiles in Delphi 2009 and up, but instead of using AnsiStrings, it then uses UnicodeStrings, WideChars and PWideChars.

Win32 API functions often come in two versions, one that takes Ansi (i.e. single-byte) characters and (C-style) strings and one that takes Wide (Unicode, double-byte) characters and (C-style) strings. These two are usually distinguished by an A or a W at the end of their name, respectively. Delphi’s interface units for such API functions, like Windows.pas, generally also define a third version, without A or W at the end of the name (just like Microsoft does, in the C headers for these functions), and map that to the Ansi-based functions. One example from a Windows.pas from before Delphi 2009:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
function GetShortPathName(lpszLongPath: PChar; lpszShortPath: PChar;
  cchBuffer: DWORD): DWORD; stdcall;
{$EXTERNALSYM GetShortPathName}
function GetShortPathNameA(lpszLongPath: PAnsiChar; lpszShortPath: PAnsiChar;
  cchBuffer: DWORD): DWORD; stdcall;
{$EXTERNALSYM GetShortPathNameA}
function GetShortPathNameW(lpszLongPath: PWideChar; lpszShortPath: PWideChar;
  cchBuffer: DWORD): DWORD; stdcall;
{$EXTERNALSYM GetShortPathNameW}
...
function GetShortPathName; external kernel32 name 'GetShortPathNameA';
function GetShortPathNameA; external kernel32 name 'GetShortPathNameA';
function GetShortPathNameW; external kernel32 name 'GetShortPathNameW';

As you can see, GetShortPathName is mapped to the function 'GetShortPathNameA'. You also see that the -A version is declared to take PAnsiChar strings, the -W version takes PWideChar strings, and the neutral version take PChar strings.

In Delphi 2009 and up, such neutrally named function declarations are now mapped to the W variety, so now it becomes:

11
12
13
function GetShortPathName; external kernel32 name 'GetShortPathNameW';
function GetShortPathNameA; external kernel32 name 'GetShortPathNameA';
function GetShortPathNameW; external kernel32 name 'GetShortPathNameW';

This means that, in Delphi 2009 and up, even if you want to call Windows API functions, but also if you call runtime library or VCL functions, most of the time, you don’t have to worry about character size. Strings are now Unicode, the API functions are now (mapped to) Unicode too, so if you keep on using the size neutral types string, Char and PChar, you won’t have to modify a lot of your code. And if there is code that happens to have the wrong character size (some API functions, like GetProcAddress only exist in an Ansi version), you get a nice compiler warning or error, to which you can and should react.

I know that Unicode is more than just UTF-16. UTF-8 and UTF-32 are Unicode too. But in the context of strings in Delphi 2009 and above, with “Unicode” I actually mean UTF-16.

Conversions

Conversions between AnsiStrings and UnicodeStrings are automatic, but they produce a warning:

10
11
12
13
14
15
16
17
18
procedure Test;
var
  Ansi: AnsiString;
  Uni: string;
begin
  Ansi := 'Hello';
  Uni := Ansi;
  Ansi := Uni;
end;
[dcc32 Warning] Project1.dpr(16): W1057 Implicit string cast from 'AnsiString' to 'string'
[dcc32 Warning] Project1.dpr(17): W1058 Implicit string cast with potential data loss from 'string' to 'AnsiString'

You can avoid these warnings by telling the compiler that you know that a conversion takes place, by doing it explicitly:

16
17
  Uni := string(Ansi); // or Uni := UnicodeString(Ansi);
  Ansi := AnsiString(Uni);

But note that such an implicit conversion does not take place when you cast one of the string types to a PWideChar or PAnsiChar. So the following does not cause any automatic conversions:

11
12
13
14
15
16
17
18
19
20
var
  Ansi: AnsiString;
  Uni: UnicodeString;
  PAnsi: PAnsiChar;
  PWide: PWideChar;
begin
  Ansi := 'Hello';
  Uni := 'world';
  PWide := PChar(Ansi); // or PWideChar(Ansi);
  PAnsi := PAnsiChar(Uni);

In such a case, you must perform an explicit conversion to the correct width first:

19
20
  PWide := PChar(string(Ansi));
  PAnsi := PAnsiChar(AnsiString(Uni));

SizeOf or Length?

Of course you must be careful of code, especially code that uses low level routines like GetMem, Move or FillChar, that assumes that characters are byte sized. So to clear a static array[0..N] of Char, don’t do:

1
2
3
4
5
var
  Buffer: array[0..MAX_PATH] of Char;
begin
  // CAREFUL: SUSPICIOUS CODE
  FillChar(Buffer, MAX_PATH + 1, 0);

because Buffer is now made up of WideChars, which means it is now 2 * (MAX_PATH + 1) bytes in size. So if the size of such a buffer is required, you must use SizeOf:

5
  FillChar(Buffer, SizeOf(Buffer), 0);

Note that SizeOf should only be applied to static arrays. It does not work on dynamic arrays like array of Char. In that case, you use something like:

5
6
  SetLength(MyCharArray, MAX_PATH + 1);
  FillChar(MyCharArray[0], Length(MyCharArray) * SizeOf(Char), 0);

Instead of MyCharArray[0], I more and more prefer to use something like Pointer(MyCharArray)^, since the former form can generate a range check error if MyCharArray is of length 0, since in that case, there is no element 0. So then the code becomes:

1
2
3
4
5
var
  Buffer: array of Char; // or: TArray<Char> if your compiler supports that
begin
  SetLength(Buffer, MAX_PATH + 1);
  FillChar(Pointer(MyCharArray)^, Length(MyCharArray) * SizeOf(Char), 0);

For situations where the number of characters is important, you use Length:

4
  StrLCopy(Buffer, PChar(MyString), Length(Buffer));

Further Information

There is a whitepaper by Marco Cantù, which describes the various new string types and enhancements extensively and very clearly. I recommend you download it and read it at least once.

More tips and tricks about converting your strings to Delphi 2009 and up can be found in these articles by Nick Hodges, former Delphi R&D Manager: Delphi in a Unicode World, Part 1, Part 2 and Part 3.

There is a bunch of Unicode related articles and documents on the Embarcadero Developer Network.

Conclusions

The open secrets of good design practice include the importance of knowing what to keep whole, what to combine, what to separate, and what to throw away. — Kevlin Henny

Although string and PChar are both string types, they are quite different. Strings are easier to use, whereas for PChars you must do almost everything yourself. You can use them together, and cast a string as PChar, and assign a PChar to a string, but because a string changes its address when it is changed, you should not hold on very long to the address you obtain by casting a string to a PChar. Assigning a PChar to a string is less hazardous, because a copy is made to the internal buffer.

As the previous text demonstrated, allocating text in a function and then returning a PChar to the new buffer is ususally not a good idea. It is even worse if it is done across a DLL boundary, since the user can perhaps not even free the memory – the DLL and the user probably use a different memory manager, and each has a different heap. It is also not a very good idea to use a local buffer to return text.

If you must use PChar, because a function requires it, you should use string as much as possible, and only cast to PChar when you use the string as a parameter. Using strings is much easier, and less error prone, than using the C-style string functions.

Finally

A little inaccuracy sometimes saves a ton of explanation. — H. H. Munro (Saki)

I hope I have lifted a bit of the fog regarding PChars. I have not told everything there is to be known, and perhaps even twisted the exact truth a bit (for instance, not every Delphi string is reference counted – string literals always have a reference count of -1), but those internal details are not important for the big picture, and have no bearing on the safe use and interaction of strings and PChars.

Rudy Velthuis

Standard Disclaimer for External Links

These links are being provided as a convenience and for informational purposes only; they do not constitute an endorsement or an approval of any of the products, services or opinions of the corporation or organization or individual. I bear no responsibility for the accuracy, legality or content of the external site or for that of subsequent links. Contact the external site for answers to questions regarding its content.

Disclaimer and Copyright

The coding examples presented here are for illustration purposes only. The author takes no responsibility for end-user use. All content herein is copyrighted by Rudy Velthuis, and may not be reproduced in any form without the author's permission. Source code written by Rudy Velthuis presented as download is subject to the license in the files.

Back to top