fredag 11. mai 2012

Accessing a Fortran-library from C

I recently had to work with a scientist who had written a nice library in his preferred language, Fortran. The library was useful, we wanted to publish the results on our webservers and had thus to connect it to apache. Unfortunately, apache isn't written in Fortran, so we had to create a C-wrapper around the Fortran code.

This starts quite simple. A subroutine like
subroutine calcThis(in, out)
real, intent(IN) :: in
real, intent(OUT):: out
can be called from C like
callthis_(&in, &out);
Note here, that fortran-functions appear in lower-case and with usually one underscore in C, and all, even input variables are given by reference.

The problem starts when asking the question: What is this real, is it float or double? The sad answer is any, because the type is defined by the compiler, or rather, the compiler options, e.g. for gcc, -fdefault-real-8 means to use real*8 internally, which can be 8bit or 16bit, depending on architecture and controlled by -fdefault-double-8. Since Fortran2003, this can be much better controlled using the ISO_C_BINDING, which allow to use explicitly C-datatypes like real, KIND(C_DOUBLE)
This was originally thought to help fortran access C-libraries, but it also helps the other way around. Unfortunately, at least in gcc-4.4, it does not check if the datatypes are used inconsistently, so it is possible to intermix a 4bit real with an 8bit real and so on.

Strings are even more tricky. While the C-String is only a character array with a final 0 (char*), a fortran string is a object, with internally defined length. While the calling convention with intermixed C is not defined, it is often (gfortran, pgf), used the following way:
fortranfunc_(char* string1, char* string2, int strLen1, int strLen2)
i.e., the length of the strings are the last arguments of the function. This can get very complicated, e.g. when variable length-arrays of strings are used, e.g.
integer, intend(IN) :: aryLen
character*256, intend(IN) :: strAry(aryLen)
Also here, the ISO_C_BINDING can help:
integer, intend(IN) :: aryLen
character, kind(LEN=1, C_CHAR) :: strAry(aryLen*256)

The length is always 1, since it sends only a pointer. The allocated length is then the length of a single string times the array-length.
Unfortunately gfortran (and I think other compilers, too), don't convert between the string-types automatically, and copying a C-string to a fortran string needs special care. On the net, I have found some Fortran functions, which should do so, but I didn't manage to get them compiled and I had to change them to Fortran subroutines to work:
SUBROUTINE C_F_STRING (C_STR, F_STRING)
USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_CHAR, C_NULL_CHAR
IMPLICIT NONE
CHARACTER(LEN=1,KIND=C_CHAR), intent(IN) :: C_STR
CHARACTER(LEN=*), intent(INOUT)          :: F_STRING ! output
INTEGER                      :: I, N
LOGICAL                      :: SETVAL
   
N = LEN(F_STRING)
SETVAL = .true.
DO I = 1, N
   IF (C_STR(I:I) .EQ. C_NULL_CHAR) THEN
       SETVAL = .false.
       F_STRING = F_STRING(:I-1)
   END IF
   IF (SETVAL) THEN
       F_STRING(I:I) = C_STR(I:I)
   ELSE
       GOTO 100
   END IF
END DO
100   CONTINUE

END subroutine C_F_STRING


SUBROUTINE F_C_STRING (F_STRING, C_STRING)
USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_CHAR, C_NULL_CHAR
IMPLICIT NONE
CHARACTER(LEN=*), INTENT(IN) :: F_STRING
CHARACTER(LEN=1,KIND=C_CHAR), INTENT(INOUT) :: C_STRING(LEN(F_STRING)+1)
INTEGER                      :: N, I

N = LEN_TRIM(F_STRING)
DO I = 1, N
   C_STRING(I) = F_STRING(I:I)
END DO
C_STRING(N + 1) = C_NULL_CHAR

END SUBROUTINE F_C_STRING

It should be noted that the C_String must have allocated 1 byte more than the fortran strings, due to the additional ending 0. The subroutines are then used like:
! initialization
CHARACTER(LEN=1, KIND=C_CHAR), INTENT(INOUT) :: rep250(251*maxrep)
CHARACTER(LEN=1, KIND=C_CHAR), INTENT(IN) :: desc100(101)
CHARACTER(LEN=250) :: f_rep250(maxrep)
CHARACTER(LEN=100) :: f_desc100

! convert to fortran strings
CALL C_F_STRING(desc100,f_desc100)
! the array case
DO i = 1, maxrep
   CALL C_F_STRING(rep250((i-1)*251+1) ,f_rep250(i))
END DO

! do something with the fortran strings
...

! convert to C-strings
CALL F_C_STRING(f_desc100, desc100)
DO i = 1, maxrep
   CALL F_C_STRING(f_rep250(i), rep250((i-1)*251+1))
END DO
Since this might require a lot of additional wrapper-code to be written, it should be automatized. In my cases, I wrote a pseudo header including tags like
FWRAP(real workWithStrings)[
                      IN integer maxrep,
                      INOUT character*250(maxrep) rep250,
                      IN character*100 desc100]
which got processed by a perl-skript, writing both the C header-file, and the f90 wrapper-functions. Good information about the ISO_C_BINDING can be found in the IBM Linux Compilers handbook.