Using C object files in Delphi

C is a very widely used language, and this has made the worldwide code library for C huge. The code library for Delphi is comparably small, so it would be nice if we could use parts of that huge library directly, without a translation of the entire code in Delphi. Fortunately, Delphi allows you to link compiled C object files. But there is this problem with "unsatisfied externals".

C is a simple but powerful language, that gets most of its functionality from its runtime library. Almost all non-trivial C code needs some of the functions in this library. But Delphi's runtime doesn't contain these functions. So simply linking the C object file will make the linker complain about "unsatisfied external declarations". Luckily, C accepts any implementation of such a function, no matter in which code module it is defined. If the linker can find a function with the desired name, it can be used. You can use this to provide missing parts of the runtime yourself, in your Delphi code.

In this article I will demonstrate how to compile and link an object file into a Delphi unit, and provide the missing parts of the C runtime that it needs. For this, I will use the well known public domain regular expression search code that Henry Spencer of the University of Toronto wrote. I only slightly modified it to make it compile with Borland's C++ compiler. Regular expressions are explained in short in the Delphi help files, and are a way of defining nifty search patterns.

Object Files

C normally generates object files, that are to be linked to an executable. On 32 bit Windows, these usually have the file extension ".obj". But these come in different, incompatible formats. Microsoft's C++ compiler, and some other compatible compilers, generate object files in a slightly modified COFF format. These can't be used in Delphi. Delphi requires OMF formatted object files. There is no practicable way of converting normal COFF objetc files to OMF, so you will need the source, and a compiler that generates OMF files.

Borland's C++ Builder does generate such OMF object files. But not each Delphi user has C++Builder as well. Luckily, Borland made the command line compiler that comes with Borland C++ Builder version 5 freely available. it can be downloaded from this link, if you provide some information. If you don't have it yet, get it now. Borland already released version 6, so it is not clear until when the free version 5 compiler will be available.

There is another limitation to what kind of files you can use. You can only use object files that are compiled as C files, not C++ files. For some reason, the Delphi linker has problems with object files that contain C++. This means that your source files must have the extension ".c" and not ".cpp". But since you can't use C++ classes directly anyway, that is not a severe limitation.

One note: C often uses library (".lib") files as well. These simply contain multiple object files, and some C compilers come with a librarian program to extract, insert, replace or simply list object files in them. In Delphi, you can't link .lib files directly. But you can use TDUMP.EXE to see what is stored in them, and the free C++ compiler comes with the TLIB.EXE librarian.

The code

I will not discuss the mechanism or use of regular expressions here. There is enough material available in books and on the Internet. But to exploit them with this code, you first pass a regular expression pattern to a kind of very simple compiler, that turns the textual representation into a version that can easily be interpreted by the search code. The compilation is done by the function regcompile(). To search a string for a regular expression pattern, you pass the compiled pattern and the string to the regexec() function. It will return information about if, and where in the string, it found text matching the pattern.

The complete implementation code for the regular expression search is rather complicated and long, so I will not show that. But the header file is of course important for the Delphi code using the object file. Here it is.

/***************************************************************************/
/*                                                                         */
/* regexp.h                                                                */
/*                                                                         */
/* Copyright (c) 1986 by Univerisity of Toronto                            */
/*                                                                         */
/* This public domain file was originally written by Henry Spencer for the */
/* University of Toronto and was modified and reformatted by Rudy Velthuis */
/* for use with Borland C++ Builder 5.                                     */
/*                                                                         */
/***************************************************************************/


#ifndef REGEXP_H
#define REGEXP_H

#define RE_OK                   0
#define RE_NOTFOUND             1
#define RE_INVALIDPARAMETER     2
#define RE_EXPRESSIONTOOBIG     3
#define RE_OUTOFMEMORY          4
#define RE_TOOMANYSUBEXPS       5
#define RE_UNMATCHEDPARENS      6
#define RE_INVALIDREPEAT        7
#define RE_NESTEDREPEAT         8
#define RE_INVALIDRANGE         9
#define RE_UNMATCHEDBRACKET     10
#define RE_TRAILINGBACKSLASH    11
#define RE_INTERNAL             20
#define RE_NOPROG               30
#define RE_NOSTRING             31
#define RE_NOMAGIC              32
#define RE_NOMATCH              33
#define RE_NOEND                34
#define RE_INVALIDHANDLE        99

#define NSUBEXP  10

/*
 * The first byte of the regexp internal "program" is actually this magic
 * number; the start node begins in the second byte.
 */
#define	MAGIC	0234

#pragma pack(push, 1)

typedef struct regexp
{
    char *startp[NSUBEXP];
    char *endp[NSUBEXP];
    char regstart;              /* Internal use only. */
    char reganch;               /* Internal use only. */
    char *regmust;              /* Internal use only. */
    int regmlen;                /* Internal use only. */
    char program[1];            /* Internal use only. */
} regexp;

#ifdef __cplusplus
extern "C" {
#endif

extern int regerror;
extern regexp *regcomp(char *exp);
extern int regexec(register regexp* prog, register char *string);
extern int reggeterror(void);
extern void regseterror(int err);
extern void regdump(regexp *exp);

#ifdef __cplusplus
}
#endif
                                   
#pragma pack(pop)

#endif // REGEXP_H

The header above defines a few constant values, a structure to pass information between the regular expression code and the caller, and also between the different functions of the code, and the functions that the user can call.

The #define values that start with RE_ are constants that are returned from the functions to indicate success or an error. NSUBEXP is the number of subexpressions a regular expression may have in this implementation. The number called MAGIC is a value that must be present in each compiled regular expression. If it is missing, the structure obviously doesn't contain a valid compiled regular expression. Note that 0234 is not a decimal value. The leading zero tells the C compiler that this is an octal value. Like hexadecimal uses 16 as number base, and decimal uses 10, octal uses 8. The decimal value is calculated this way:

0234(oct) = 2 * 82 + 3 * 8 + 4 = 128 + 24 + 4 = 156.

The #pragma pack(push, 1) pushes the current alignment state, and sets it to bytewise alignment. #pragma pack(pop) restores the previous state. This is important, because it makes the structure compatible with Delphi's packed record.

Compiling the code

If you have C++ Builder, it is a little easier to compile the code. You create a new project, and add the file "regexp.c" to it via the menu selections "Project", "Add to project", and compile the project. As a result of this, the directory will contain a file "regexp.obj"

If you have the command line compiler, and that is set up correctly, you open a command prompt, go to the directory that contains the file "regexp.c" and enter:
bcc32 -c regexp.c

Perhaps you'll get a warning about an unused variable, or about conversions losing significant digits, but you can ignore them in this case, since you didn't write the code anyway. I am using this code myself for years already, without any problems. After compilation, you'll find the object file "regexp.obj" in the same directory as the source file.

To import the object file in Delphi, you should now copy the object file to the directory with your Delphi source.

Importing the object file

To use the code in the object file, you'll have to write some declarations. The Delphi linker doesn't know anything about the parameters of the functions, about the regexp type in the header, and about the values that were defined in the file "regexp.h". It doesn't know what calling convention was used, either. To do this, you write an import unit.

Here is the interface part of the Delphi unit that is used to import the functions and values from the C object file into Delphi:

unit RegExpObj;

interface

const
  NSUBEXP = 10;

  // The first byte of the regexp internal "program" is actually this magic
  // number; the start node begins in the second byte.
  MAGIC = 156;

type
  PRegExp = ^_RegExp;
  _RegExp = packed record
    StartP: array[0..NSUBEXP - 1] of PChar;
    EndP: array[0..NSUBEXP - 1] of PChar;
    RegStart: Char;             // Internal use only.
    RegAnch: Char;              // Internal use only.
    RegMust: PChar;             // Internal use only.
    RegMLen: Integer;           // Internal use only.
    Prog: array[0..0] of Char;  // Internal use only.
  end;

function _regcomp(exp: PChar): PRegExp; cdecl;
function _regexec(prog: PRegExp; str: PChar): LongBool; cdecl;
function _reggeterror: Integer; cdecl;
procedure _regseterror(Err: Integer); cdecl;

You'll notice that all the functions got an underscore in front of them. This is because, for historic reasons, most C compilers still generate C functions with names that start with an underscore. To import them, you'll have to use the "underscored" names. You could tell the C++Builder compiler to omit the underscores, but I normally don't do that. The underscores clearly show that we are using C functions. These must be declared with the C calling convention, which is called cdecl in Delphi parlance. Forgetting this can produce bugs that are very hard to trace.

The original code of Henry Spencer didn't have the reggeterror() and regseterror() functions. I had to introduce them, because you can't use variables in the object files from the Delphi side directly, and the code requires access to reset the error value to 0, and to get the error value. But you can use Delphi variables from the C object file. Sometimes object files even require external variables to be present. If they don't exist, you can declare them somewhere in your Delphi code.

Ideally, the implementation part of the unit would look like this:
implementation

uses
  SysUtils;

{$LINK 'regexp.obj'}

function _regcomp(exp: PChar): PRegExp; cdecl; external;
function _regexec(prog: PRegExp; str: PChar): LongBool; cdecl; external;
function _reggeterror: Integer; cdecl; external;
procedure _regseterror(Err: Integer); cdecl; external;
                                                                           
end.                                                                           

But if you compile that, the Delphi linker will complain about unsatisfied externals. The Delphi unit will have to provide them. Most runtime functions are simple, and can easily be coded in Delphi. Only functions that take a variable number of arguments, like printf() or scanf(), are impossible to do without resorting to assembler. Perhaps, if you could find the code of printf() or scanf() in the C++ libraries, you could extract the object file and link that file in as well. I have never tried this.

The regular expression code needs the C library functions malloc() to allocate memory, strlen() to calculate the length of a string, strchr() to find a single character in a string, strncmp() to compare two strings, and strcspn() to find the first character from one string in another string.

The first four functions are simple, and can be coded in one line of Delphi code, since Delphi has similar functions as well. But for strcspn() there is no equivalent function in the Delphi runtime library, so it must be coded by hand. Fortunately, I had (admittedly, rather ugly) C code for such a function, and I only had to translate that to Delphi. Otherwise I'd have had to read the specifications really carefully, and try to implement it myself.

The missing part of the implementation section of the unit looks like this:


// since this unit provides the code for _malloc, it can use FreeMem to free the
// PRegExp it gets. But normally, a _regfree() would be nice.
                                                                
function _malloc(Size: Cardinal): Pointer; cdecl;
begin
  GetMem(Result, Size);
end;

function _strlen(const Str: PChar): Cardinal; cdecl;
begin
  Result := StrLen(Str);
end;

function _strcspn(s1, s2: PChar): Cardinal; cdecl;
label Bye;
var
  SrchS2: PChar;
  Len: Integer;
begin
  Len := 0;
  while S1^ <> #0 do
  begin
    SrchS2 := S2;
    while SrchS2^ <> #0 do
    begin
      if S1^ = SrchS2^ then
        goto Bye;
      Inc(SrchS2);
    end;
    Inc(S1);
    Inc(Len);
  end;
Bye:
  Result := Len;
end;

function _strchr(const S: PChar; C: Integer): PChar; cdecl;
begin
  Result := StrScan(S, Chr(C));
end;

function _strncmp(S1, S2: PChar; MaxLen: Cardinal): Integer; cdecl;
begin
  Result := StrLComp(S1, S2, MaxLen);
end;

As you can see, these functions must also be declared cdecl and have a leading underscore. The function names are also case sensitive, so their correct spelling is important.

In my project, I don't use this code directly. The _RegExp structure contains information that should not be changed from outside, and is a bit awkward to use. So I wrapped it up in a few simple functions, and provided a RegFree function as well, which simply calls FreeMem, since the _malloc() I provided uses GetMem. Ideally, the regular expression code should have provided a regfree() function.

The entire C source code, the code for the import unit and the wrapper unit, as well as a very simple grep program can be found on my Downloads page.

Conclusion

Provided you have a little knowledge of C, and are not afraid to write a replacement for a few missing C runtime library functions yourself, linking C object files to a Delphi unit is easy. It allows you to create a program that does not need a DLL, and can be deployed in one piece.

If you need help with using the free C++ Builder command line compiler (compiler version 5.5), you will find excellent help in the Borland newsgroup

news://newsgroups.borland.com/borland.public.cppbuilder.commandlinetools

The newsgroup

news://newsgroups.borland.com/borland.public.cppbuilder.language

is available for questions about the language. I wish you a nice time experimenting.

Rudy Velthuis

Back to top