[Fortran] pad char to int conversions with spaces instead of zeros (legacy)

Message ID 17e10b1f-a4a8-745a-0247-7bddfd90df7f@codethink.co.uk
State New
Headers show
Series
  • [Fortran] pad char to int conversions with spaces instead of zeros (legacy)
Related show

Commit Message

Mark Eggleston Dec. 4, 2018, 2:47 p.m.
Here is a patch to considered for incorporation into gfortran adding to 
its legacy support. It pads character to integer conversions using 
spaces instead of zeros when enabled.

The pad character is 'undefined' or 'processor dependent' depending on 
which standard you read. This makes it 0x20 which matches the Oracle 
Fortran compiler.

Enabled using -fdec-pad-with-spaces and -fdec.

Please find attached a patch file and text files containing change log 
entries for gcc/fortran and gcc/testsuite.

Note: I do not have write access so can not commit changes. Someone else 
will have to commit the change if it is acceptable.

There are several more patches to support legacy Fortran which will be 
sent in order.

regards,

Mark Eggleston

-- 
https://www.codethink.co.uk/privacy.html
Jim MacArthur <jim.macarthur@codethink.co.uk>
	Mark Eggleston <mark.eggleston@codethink.com>

	-fdec-pad-with-spaces
	* gcc/fortran/lang.opt: add new option.
	* gcc/fortran/options.c (set_dec_flags): add SET_BITFLAG for
	flag_dec_pad_with_spaces.
	* gcc/fortran/simplify.c (gfc_simplify_transfer): memset with
	0x20 instead of 0x0 when flag_dec_pad_with_spaces is set.
Mark Eggleston <mark.eggleston@codethink.com>

	-fdec-pad-with-spaces
	* gfortran.dg/dec-pad-with-spaces-1.f90: New test.
	* gfortran.dg/dec-pad-with-spaces-2.f90: New test.
	* gfortran.dg/dec-pad-with-spaces-3.f90: New test.

Comments

Jakub Jelinek Dec. 4, 2018, 3:11 p.m. | #1
On Tue, Dec 04, 2018 at 02:47:25PM +0000, Mark Eggleston wrote:
> Here is a patch to considered for incorporation into gfortran adding to its

> legacy support. It pads character to integer conversions using spaces

> instead of zeros when enabled.

> 

> The pad character is 'undefined' or 'processor dependent' depending on which

> standard you read. This makes it 0x20 which matches the Oracle Fortran

> compiler.


Trying fortran.godbolt.org, I think ifort pads this with spaces too.

> Enabled using -fdec-pad-with-spaces and -fdec.


Just a couple of random comments.
-fdec-pad-with-spaces option name doesn't look right, because it doesn't say
what the option affects.  So perhaps have transfer in the option name?
Wouldn't it be better to allow specifying whatever character you want to pad
with, so that users can choose something even different?

> --- a/gcc/fortran/simplify.c

> +++ b/gcc/fortran/simplify.c

> @@ -7830,7 +7830,7 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)

>    /* Allocate the buffer to store the binary version of the source.  */

>    buffer_size = MAX (source_size, result_size);

>    buffer = (unsigned char*)alloca (buffer_size);

> -  memset (buffer, 0, buffer_size);

> +  memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);


This affects just the simplification when the argument is a known constant.
Shouldn't we handle it also when it is not a constant?  Like I've tried:
program test
  integer(kind=8) :: a, b, c
  character(len=4) :: e
  a = transfer("ABCE", 1_8)
  e = "ABCE"
  b = transfer(e, 1_8)
  c = transfer("ABCE    ", 1_8)
  print *, a, b, c
end
and for a the result is on little-endian indeed z'45434241', for b
the upper 32 bits are completely random:
            D.3854 = 4;
            D.3856 = 8;
            _1 = MIN_EXPR <D.3856, D.3854>;
            _2 = MAX_EXPR <_1, 0>;
            _3 = (unsigned long) _2;
            __builtin_memcpy (&transfer.0, &e, _3);
            transfer.3_4 = transfer.0;
            b = transfer.3_4;
and for c it is the padding with zeros I assume you want for -fdec.
So, what does Oracle fortran or ifort do for this b case above?

> --- /dev/null

> +++ b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90

> @@ -0,0 +1,17 @@

> +! { dg-do run }

> +! { dg-options "-fdec-pad-with-spaces" }

> +!

> +! Test case contributed by Mark Eggleston <mark.eggleston@codethink.com>

> +

> +program test

> +  integer(kind=8) :: a

> +  a = transfer("ABCE", 1_8)

> +  ! If a has not been converted into big endian

> +  ! or little endian integer it has failed.

> +  if ((a.ne.int(z'4142434520202020',kind=8)).and. &

> +      (a.ne.int(z'2020202045434241',kind=8))) then 


The tests look too much big-endian vs. little-endian dependent and
ascii dependent.  We have pdp-endian and doesn't s390x TPF use EBCDIC?

Wouldn't it be better to compare transfer("ABCE", 1_8) with transfer("ABCE    ", 1_8)
?

	Jakub
Fritz Reese Dec. 4, 2018, 5:04 p.m. | #2
On Tue, Dec 4, 2018 at 10:12 AM Jakub Jelinek <jakub@redhat.com> wrote:
>

> Just a couple of random comments.

> -fdec-pad-with-spaces option name doesn't look right, because it doesn't say

> what the option affects.  So perhaps have transfer in the option name?

[...]
> Wouldn't it be better to allow specifying whatever character you want to pad

> with, so that users can choose something even different?


I concur with this. In that case some option like -ftransfer-pad-char=
may be a more appriopriate name, where -fdec may enable a default
transfer-pad-char of \x20 rather than \x00.

>

> > --- a/gcc/fortran/simplify.c

> > +++ b/gcc/fortran/simplify.c

> > @@ -7830,7 +7830,7 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)

[...]
> This affects just the simplification when the argument is a known constant.

> Shouldn't we handle it also when it is not a constant?


Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)
to affect non-constant resolution.

> The tests look too much big-endian vs. little-endian dependent and

> ascii dependent.  We have pdp-endian and doesn't s390x TPF use EBCDIC?

>

> Wouldn't it be better to compare transfer("ABCE", 1_8) with transfer("ABCE    ", 1_8)


Agreed.

---
Fritz
Jerry DeLisle Dec. 6, 2018, 2:27 a.m. | #3
On 12/4/18 9:04 AM, Fritz Reese wrote:
> On Tue, Dec 4, 2018 at 10:12 AM Jakub Jelinek <jakub@redhat.com> wrote:

>>

>> Just a couple of random comments.

>> -fdec-pad-with-spaces option name doesn't look right, because it doesn't say

>> what the option affects.  So perhaps have transfer in the option name?

> [...]

>> Wouldn't it be better to allow specifying whatever character you want to pad

>> with, so that users can choose something even different?

> 

> I concur with this. In that case some option like -ftransfer-pad-char=

> may be a more appriopriate name, where -fdec may enable a default

> transfer-pad-char of \x20 rather than \x00.


I disagree completely. I assume the idea of -fdec-pad-with-spaces is to 
accomodate some old dec fortran code. The only reason to use some other 
character is if someone is writing new dec fortran code, which is 
implying encouraging people to be writing non standard conforming code.

Even if it is conforming in the sense that it is processor dependent you 
are encouraging people to create new non portable code across compilers. 
Please just stay consistent with Intel.

This whole padding business just stinks to me. There are better ways to 
accomplish these results without using transfer to convert ascii strings 
into bit patterns or whatever the heck some programmer is trying to do 
here, like maybe use 'ABCE     ' if thats what is really needed. And, 
please everyone please quit fiddling with the compiler to fix problems 
in the source code. Are people so lazy or such cheapskates they won't do 
this the right way and update the damn source code if it needs to be used.

We have truly more serious and real problems/bugs in gfortran that 
people should be spending the scarce resources on and not this junk.

Jerry
Mark Eggleston Dec. 6, 2018, 10:14 a.m. | #4
On 04/12/2018 15:11, Jakub Jelinek wrote:
> On Tue, Dec 04, 2018 at 02:47:25PM +0000, Mark Eggleston wrote:

>> Here is a patch to considered for incorporation into gfortran adding to its

>> legacy support. It pads character to integer conversions using spaces

>> instead of zeros when enabled.

>>

>> The pad character is 'undefined' or 'processor dependent' depending on which

>> standard you read. This makes it 0x20 which matches the Oracle Fortran

>> compiler.

> Trying fortran.godbolt.org, I think ifort pads this with spaces too.

>

>> Enabled using -fdec-pad-with-spaces and -fdec.

> Just a couple of random comments.

> -fdec-pad-with-spaces option name doesn't look right, because it doesn't say

> what the option affects.  So perhaps have transfer in the option name?

> Wouldn't it be better to allow specifying whatever character you want to pad

> with, so that users can choose something even different?

Fritz Reese agrees with you here and suggests -ftransfer-pad-char= with 
-fdec setting it to 0x20.
>

>> --- a/gcc/fortran/simplify.c

>> +++ b/gcc/fortran/simplify.c

>> @@ -7830,7 +7830,7 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)

>>     /* Allocate the buffer to store the binary version of the source.  */

>>     buffer_size = MAX (source_size, result_size);

>>     buffer = (unsigned char*)alloca (buffer_size);

>> -  memset (buffer, 0, buffer_size);

>> +  memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);

> This affects just the simplification when the argument is a known constant.

> Shouldn't we handle it also when it is not a constant?  Like I've tried:

> program test

>    integer(kind=8) :: a, b, c

>    character(len=4) :: e

>    a = transfer("ABCE", 1_8)

>    e = "ABCE"

>    b = transfer(e, 1_8)

>    c = transfer("ABCE    ", 1_8)

>    print *, a, b, c

> end

> and for a the result is on little-endian indeed z'45434241', for b

> the upper 32 bits are completely random:

>              D.3854 = 4;

>              D.3856 = 8;

>              _1 = MIN_EXPR <D.3856, D.3854>;

>              _2 = MAX_EXPR <_1, 0>;

>              _3 = (unsigned long) _2;

>              __builtin_memcpy (&transfer.0, &e, _3);

>              transfer.3_4 = transfer.0;

>              b = transfer.3_4;

> and for c it is the padding with zeros I assume you want for -fdec.

Clearly insufficient testing let this through. The padding should be 
done for both literals and variables.
> So, what does Oracle fortran or ifort do for this b case above?

Don't have access to either of those compilers. It may be possible to 
check the ifort compiler.
>

>> --- /dev/null

>> +++ b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90

>> @@ -0,0 +1,17 @@

>> +! { dg-do run }

>> +! { dg-options "-fdec-pad-with-spaces" }

>> +!

>> +! Test case contributed by Mark Eggleston <mark.eggleston@codethink.com>

>> +

>> +program test

>> +  integer(kind=8) :: a

>> +  a = transfer("ABCE", 1_8)

>> +  ! If a has not been converted into big endian

>> +  ! or little endian integer it has failed.

>> +  if ((a.ne.int(z'4142434520202020',kind=8)).and. &

>> +      (a.ne.int(z'2020202045434241',kind=8))) then

> The tests look too much big-endian vs. little-endian dependent and

> ascii dependent.  We have pdp-endian and doesn't s390x TPF use EBCDIC?

I hadn't considered those.
>

> Wouldn't it be better to compare transfer("ABCE", 1_8) with transfer("ABCE    ", 1_8)

> ?

That simplifies things.
>

> 	Jakub


Thanks for the feedback. I inherited this patch, my addition of 
-fdec-pad-with-spaces and improved testing were insufficient.

Will resubmit the patch after taking these issues into account.

Mark.

-- 
https://www.codethink.co.uk/privacy.html
Mark Eggleston Dec. 6, 2018, 10:20 a.m. | #5
On 04/12/2018 17:04, Fritz Reese wrote:
> On Tue, Dec 4, 2018 at 10:12 AM Jakub Jelinek <jakub@redhat.com> wrote:

>> Just a couple of random comments.

>> -fdec-pad-with-spaces option name doesn't look right, because it doesn't say

>> what the option affects.  So perhaps have transfer in the option name?

> [...]

>> Wouldn't it be better to allow specifying whatever character you want to pad

>> with, so that users can choose something even different?

> I concur with this. In that case some option like -ftransfer-pad-char=

> may be a more appriopriate name, where -fdec may enable a default

> transfer-pad-char of \x20 rather than \x00.


How would the value be specified?

-ftransfer-pad-char=0x20
-ftransfer-pad-char=32
-ftransfer-pad-char=' '

If -fdec and -ftransfer-pad-char are both specified which should have 
precedence?

For -fdec and -fno-dec both specified would I be correct in assuming 0x0 
should be used?

>

>>> --- a/gcc/fortran/simplify.c

>>> +++ b/gcc/fortran/simplify.c

>>> @@ -7830,7 +7830,7 @@ gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)

> [...]

>> This affects just the simplification when the argument is a known constant.

>> Shouldn't we handle it also when it is not a constant?

> Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)

> to affect non-constant resolution.

Thanks for the hint.
>

>> The tests look too much big-endian vs. little-endian dependent and

>> ascii dependent.  We have pdp-endian and doesn't s390x TPF use EBCDIC?

>>

>> Wouldn't it be better to compare transfer("ABCE", 1_8) with transfer("ABCE    ", 1_8)

> Agreed.

>

> ---

> Fritz


Thanks for the feedback.

Mark

-- 
https://www.codethink.co.uk/privacy.html
Jakub Jelinek Dec. 6, 2018, 10:33 a.m. | #6
On Wed, Dec 05, 2018 at 06:27:00PM -0800, Jerry DeLisle wrote:
> I disagree completely. I assume the idea of -fdec-pad-with-spaces is to

> accomodate some old dec fortran code. The only reason to use some other

> character is if someone is writing new dec fortran code, which is implying

> encouraging people to be writing non standard conforming code.

> 

> Even if it is conforming in the sense that it is processor dependent you are

> encouraging people to create new non portable code across compilers. Please

> just stay consistent with Intel.


So do you prefer to always use ' ' instead of '\0', or decide based on -fdec
without a separate option controlling that?

	Jakub
Mark Eggleston Dec. 6, 2018, 10:54 a.m. | #7
On 06/12/2018 10:33, Jakub Jelinek wrote:
> On Wed, Dec 05, 2018 at 06:27:00PM -0800, Jerry DeLisle wrote:

>> I disagree completely. I assume the idea of -fdec-pad-with-spaces is to

>> accomodate some old dec fortran code. The only reason to use some other

>> character is if someone is writing new dec fortran code, which is implying

>> encouraging people to be writing non standard conforming code.

I should have made this clear.
>>

>> Even if it is conforming in the sense that it is processor dependent you are

>> encouraging people to create new non portable code across compilers. Please

>> just stay consistent with Intel.

I agree.
> So do you prefer to always use ' ' instead of '\0', or decide based on -fdec

> without a separate option controlling that?

It would be simpler to dispense with the extra option and just set space 
padding with -fdec. I think it would be odd to use something other than 
a space or '\0' for padding.
>

> 	Jakub

Mark

-- 
https://www.codethink.co.uk/privacy.html
Jerry DeLisle Dec. 7, 2018, 1:57 a.m. | #8
On 12/6/18 2:33 AM, Jakub Jelinek wrote:
> On Wed, Dec 05, 2018 at 06:27:00PM -0800, Jerry DeLisle wrote:

>> I disagree completely. I assume the idea of -fdec-pad-with-spaces is to

>> accomodate some old dec fortran code. The only reason to use some other

>> character is if someone is writing new dec fortran code, which is implying

>> encouraging people to be writing non standard conforming code.

>>

>> Even if it is conforming in the sense that it is processor dependent you are

>> encouraging people to create new non portable code across compilers. Please

>> just stay consistent with Intel.

> 

> So do you prefer to always use ' ' instead of '\0', or decide based on -fdec

> without a separate option controlling that?

> 


Keep current default of '\0' and use ' ' when -fdec given.

Regards,

Jerry
Mark Eggleston Dec. 10, 2018, 2:09 p.m. | #9
On 06/12/2018 10:20, Mark Eggleston wrote:
>> Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)

>> to affect non-constant resolution.

> Thanks for the hint.


I've looked at gfc_resolve_transfer regarding handling of padding when a 
character variable is passed to transfer instead of a literal. This 
routine is not called so can't be where a variable would be handled.

Don't yet know where to make the change.

regards,

Mark

-- 
https://www.codethink.co.uk/privacy.html
Jakub Jelinek Dec. 10, 2018, 5:12 p.m. | #10
On Mon, Dec 10, 2018 at 02:09:50PM +0000, Mark Eggleston wrote:
> 

> On 06/12/2018 10:20, Mark Eggleston wrote:

> > > Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)

> > > to affect non-constant resolution.

> > Thanks for the hint.

> 

> I've looked at gfc_resolve_transfer regarding handling of padding when a

> character variable is passed to transfer instead of a literal. This routine

> is not called so can't be where a variable would be handled.

> 

> Don't yet know where to make the change.


I think you want to change gfc_conv_intrinsic_transfer
For the scalar case, which is the one for which I've posted testcase, there
is:
  extent = fold_build2_loc (input_location, MIN_EXPR, gfc_array_index_type,
                            dest_word_len, source_bytes);
  extent = fold_build2_loc (input_location, MAX_EXPR, gfc_array_index_type,
                            extent, gfc_index_zero_node);
and later:
      tmp = build_call_expr_loc (input_location,
                             builtin_decl_explicit (BUILT_IN_MEMCPY), 3,
                             fold_convert (pvoid_type_node, tmp),
                             fold_convert (pvoid_type_node, ptr),
                             fold_convert (size_type_node, extent));
      gfc_add_expr_to_block (&se->pre, tmp);
I guess you want to add after this, guarded with flag_dec only,
code to compare (at runtime) if extent < dest_word_len and if so,
use fill_with_spaces to pad it with spaces at the end (from
that first memcpy's argument + extent dest_word_len - extent bytes),
with a comment why you are doing it.  Guess the array case will need
something similar.  But, for these runtime checks, guess you should start by
looking what the other compilers are exactly doing for the different kinds
of transfers, if they pad for non-constants at all, if they only pad if
transfer is from a character object to whatever, etc.

Looking at ifort on godbolt
function foo (e)
  integer(kind=8) :: foo
  character(len=4) :: e
  foo = transfer(e, 1_8)
end
I see it calls for_cpystr (..., 8, ..., 4, ...)
so I guess it uses standard library function to copy string into the
destination in that case, so essentially what gfc_trans_string_copy
does inline or our fstrcpy (internal library function).

	Jakub
Fritz Reese Dec. 10, 2018, 7:43 p.m. | #11
> On Mon, Dec 10, 2018 at 02:09:50PM +0000, Mark Eggleston wrote:

> >

> > On 06/12/2018 10:20, Mark Eggleston wrote:

> > > > Yes. Mark, you'll need to also patch iresolve.c (gfc_resolve_transfer)

> > > > to affect non-constant resolution.

> > > Thanks for the hint.

> >

> > I've looked at gfc_resolve_transfer regarding handling of padding when a

> > character variable is passed to transfer instead of a literal. This routine

> > is not called so can't be where a variable would be handled.

> >

> > Don't yet know where to make the change.


It actually is called through a function pointer
(gfc_intrinsic_sym->resolve) in intrinsic.c (resolve_intrinsic), as
with all intrinsics. You can find the backtrace by running f951
(`gfortran -print-prog-name=f951`) through gdb and setting a
breakpoint on gfc_resolve_transfer. That being said...

On Mon, Dec 10, 2018 at 12:12 PM Jakub Jelinek <jakub@redhat.com> wrote:
> I think you want to change gfc_conv_intrinsic_transfer


... in this case Jakub is right. I hadn't looked at the body of the
resolve function yet, but peeking at it tells me the transfer
expression doesn't contain support for padding information, so you'd
have to deal with it yourself in translation.

> I guess you want to add after this, guarded with flag_dec only,

> code to compare (at runtime) if extent < dest_word_len and if so,

> use fill_with_spaces to pad it with spaces at the end (from

> that first memcpy's argument + extent dest_word_len - extent bytes),

> with a comment why you are doing it.  Guess the array case will need

> something similar.


Alternatively you could call BUILT_IN_MEMSET(&tmpdecl, '\x20',
dest_word_len) just prior to the MEMCPY whenever flag_dec is set.

Fritz
Mark Eggleston Dec. 12, 2018, 11:37 a.m. | #12
Before delving into the code to make changes to handle the case when 
passing a variable into transfer instead of a literal I revised the the 
test cases. The results indicate to me that this patch as originally 
intended is erroneous.

When a character literal is assigned to character variable e.g.

character(len=8) :: a
a = 'ab'

The value is padded with spaces.

So any variable passed into the transfer statement will already be 
padded with spaces. So the easiest solution is to pad with spaces when a 
character literal is passed in instead of zeros as is currently the case.

Mark


-- 
https://www.codethink.co.uk/privacy.html
Jakub Jelinek Dec. 12, 2018, 11:52 a.m. | #13
On Wed, Dec 12, 2018 at 11:37:23AM +0000, Mark Eggleston wrote:
> Before delving into the code to make changes to handle the case when passing

> a variable into transfer instead of a literal I revised the the test cases.

> The results indicate to me that this patch as originally intended is

> erroneous.

> 

> When a character literal is assigned to character variable e.g.

> 

> character(len=8) :: a

> a = 'ab'

> 

> The value is padded with spaces.


Sure, that is the standard behavior of character assignments.

What we are talking about is the case when transfer is used from a smaller
object to a larger result, whether it is e.g. 'ab' literal or character(len=2)
variable.  SOURCE doesn't have to be a character scalar (or array), it can
be anything.

The Fortran standard says here:
"If the physical representation of the result is longer than that
of SOURCE, the physical representation of the leading part is that of SOURCE and the remainder is processor
dependent."
Current gfortran choice of the processor dependent is 0 if SOURCE is a
constant and uninitialized bytes if it is not a constant.

So, does Oracle fortran / ifort etc. pad with spaces only if SOURCE has
character type, or in other cases too?

What about:
  integer(kind=2) :: a
  a = -1
  print *, transfer (1_2, 1_8), transfer (a, 1_8)
end
?

	Jakub
Mark Eggleston Dec. 12, 2018, 12:06 p.m. | #14
On 12/12/2018 11:52, Jakub Jelinek wrote:
> What about:

>    integer(kind=2) :: a

>    a = -1

>    print *, transfer (1_2, 1_8), transfer (a, 1_8)

> end

> ?


I assume you meant transfer (-1_2, 1_8), the result from gfortran is 
65535 for both transfers.

I'm about to build the compiler with

   memset (buffer, 0x20, buffer_size);

instead of

   memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);

and will check again, if necessary a padding variable can be used 
instead initially set to zero and changed to 0x20 when it is known that 
the source is character.

Marl

-- 
https://www.codethink.co.uk/privacy.html
Jakub Jelinek Dec. 12, 2018, 12:12 p.m. | #15
On Wed, Dec 12, 2018 at 12:06:12PM +0000, Mark Eggleston wrote:
> 

> On 12/12/2018 11:52, Jakub Jelinek wrote:

> > What about:

> >    integer(kind=2) :: a

> >    a = -1

> >    print *, transfer (1_2, 1_8), transfer (a, 1_8)

> > end

> > ?

> 

> I assume you meant transfer (-1_2, 1_8), the result from gfortran is 65535

> for both transfers.


It doesn't really matter that much, the question is what is in the upper
bits and mainly a) what you get with the vendor compilers b) what you get
with your patch.  Because by my reading if you use there 0x20, it would
print 2314885530818510847 or 2314885530818445313 or something similar.
> 

> I'm about to build the compiler with

> 

>   memset (buffer, 0x20, buffer_size);

> 

> instead of

> 

>   memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);

> 

> and will check again, if necessary a padding variable can be used instead

> initially set to zero and changed to 0x20 when it is known that the source

> is character.


	Jakub
Mark Eggleston Dec. 12, 2018, 3:12 p.m. | #16
On 12/12/2018 12:06, Mark Eggleston wrote:
> I'm about to build the compiler with

>

>   memset (buffer, 0x20, buffer_size);

>

> instead of

>

>   memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);

>

> and will check again, if necessary a padding variable can be used 

> instead initially set to zero and changed to 0x20 when it is known 

> that the source is character.


It was indeed necessary to only use 0x20 for padding when the source is 
known to be character. One more check was to transfer a character(4) 
variable to an integer(8) variable, there is no space padding. I don't 
yet know whether this matches the behaviour of other compilers.

I don't currently have access to other compilers. I can have some test 
cases performed on xlf and SunStudio but won't get any answers until 
after Christmas. The answer will determine whether I have any more work 
to do.

Mark


-- 
https://www.codethink.co.uk/privacy.html

Patch

From de02d8bbab91a87c30672d1a9e1c3ecc95b800f7 Mon Sep 17 00:00:00 2001
From: Jim MacArthur <jim.macarthur@codethink.co.uk>
Date: Mon, 28 Sep 2015 16:06:30 +0100
Subject: [PATCH] Pad character-to-int conversions with spaces instead of
 zeros.

The pad character is 'undefined' or 'processor dependent' depending on which
standard you read. This makes it 0x20 which matches the Oracle Fortran
compiler.

Additions by Mark Eggleston <mark.eggleston@codethink.com>:

This feature is enabled using -fdec-pad-with-spaces. Also enabled using -fdec.
---
 gcc/fortran/lang.opt                                |  4 ++++
 gcc/fortran/options.c                               |  1 +
 gcc/fortran/simplify.c                              |  2 +-
 gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90 | 17 +++++++++++++++++
 gcc/testsuite/gfortran.dg/dec-pad-with-spaces-2.f90 | 17 +++++++++++++++++
 gcc/testsuite/gfortran.dg/dec-pad-with-spaces-3.f90 | 17 +++++++++++++++++
 6 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec-pad-with-spaces-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/dec-pad-with-spaces-3.f90

diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index dc9a94c829c..dbdea73ea1f 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -452,6 +452,10 @@  fdec-math
 Fortran Var(flag_dec_math)
 Enable legacy math intrinsics for compatibility.
 
+fdec-pad-with-spaces
+Fortran Var(flag_dec_pad_with_spaces)
+For character to integer conversions, use spaces for the pad rather than NUL.
+
 fdec-structure
 Fortran Var(flag_dec_structure)
 Enable support for DEC STRUCTURE/RECORD.
diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index 48e35e3524d..7abccd28aaf 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -74,6 +74,7 @@  set_dec_flags (int value)
   SET_BITFLAG (flag_dec_static, value, value);
   SET_BITFLAG (flag_dec_math, value, value);
   SET_BITFLAG (flag_dec_include, value, value);
+  SET_BITFLAG (flag_dec_pad_with_spaces, value, value);
 }
 
 /* Finalize DEC flags.  */
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index cdf748e4990..04c2c6ea13c 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -7830,7 +7830,7 @@  gfc_simplify_transfer (gfc_expr *source, gfc_expr *mold, gfc_expr *size)
   /* Allocate the buffer to store the binary version of the source.  */
   buffer_size = MAX (source_size, result_size);
   buffer = (unsigned char*)alloca (buffer_size);
-  memset (buffer, 0, buffer_size);
+  memset (buffer, (flag_dec_pad_with_spaces ? 0x20 : 0x0), buffer_size);
 
   /* Now write source to the buffer.  */
   gfc_target_encode_expr (source, buffer, buffer_size);
diff --git a/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90 b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90
new file mode 100644
index 00000000000..c6b5302e79d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-1.f90
@@ -0,0 +1,17 @@ 
+! { dg-do run }
+! { dg-options "-fdec-pad-with-spaces" }
+!
+! Test case contributed by Mark Eggleston <mark.eggleston@codethink.com>
+
+program test
+  integer(kind=8) :: a
+  a = transfer("ABCE", 1_8)
+  ! If a has not been converted into big endian
+  ! or little endian integer it has failed.
+  if ((a.ne.int(z'4142434520202020',kind=8)).and. &
+      (a.ne.int(z'2020202045434241',kind=8))) then 
+    stop 1
+  end if
+end program test
+
+
diff --git a/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-2.f90 b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-2.f90
new file mode 100644
index 00000000000..dd1d9fb3b32
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-2.f90
@@ -0,0 +1,17 @@ 
+! { dg-do run }
+! { dg-options "-fdec" }
+!
+! Test case contributed by Mark Eggleston <mark.eggleston@codethink.com>
+
+program test
+  integer(kind=8) :: a
+  a = transfer("ABCE", 1_8)
+  ! If a has not been converted into big endian
+  ! or little endian integer it has failed.
+  if ((a.ne.int(z'4142434520202020',kind=8)).and. &
+      (a.ne.int(z'2020202045434241',kind=8))) then 
+    stop 1
+  end if
+end program test
+
+
diff --git a/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-3.f90 b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-3.f90
new file mode 100644
index 00000000000..131c5e1ceb1
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec-pad-with-spaces-3.f90
@@ -0,0 +1,17 @@ 
+! { dg-do run }
+! { dg-options "-fdec -fno-dec-pad-with-spaces" }
+!
+! Test case contributed by Mark Eggleston <mark.eggleston@codethink.com>
+
+program test
+  integer(kind=8) :: a
+  a = transfer("ABCE", 1_8)
+  ! If a has not been converted into big endian
+  ! or little endian integer it has failed.
+  if ((a.ne.int(z'4142434500000000',kind=8)).and. &
+      (a.ne.int(z'45434241',kind=8))) then 
+    stop 1
+  end if
+end program test
+
+
-- 
2.11.0