Topic: backslash


Author: shankar@sgi.com (Shankar Unni)
Date: 1995/06/29
Raw View
Steve Clamage (clamage@Eng.Sun.COM) wrote:

> It sounds like you are saying that two compilers treat the sequence
>  <backslash> <blank> <newline>
> as if the <blank> were not present, and splice the two physical lines
> into one logical line. If that is the case, they are probably wrong to
> do so.

Ah, but the problem is in defining where the "new line character" is (or
even what it is).

In order to accommodate machines with fixed record sizes for ASCII files
with blank padding, the ANSI C standard, in effect, specifically blesses
the stripping of whitespace from the end of line, by leaving it up to the
implementation to decide what constitutes an "end of line" character.

On such machines, for instance, it would be almost mandatory to strip
blanks from the end of the "record", and thus the two cases would be the
same.

As to whether it's *reasonable* for a DOS (or Unix) compiler to strip out
trailing blanks from what are clearly variable-length records, that's up
the implementation. Personally, I'd call them both "broken".
--
Shankar Unni    E-Mail: shankar@sgi.com
Silicon Graphics Inc.   Phone: +1-415-390-2072
URL: http://reality.sgi.com/employees/shankar





Author: mansionj@lonnds.ml.com (James Mansion LADS LDN X4923)
Date: 1995/06/29
Raw View
I seem to remember that the behaviour described is undefined - or at least is
in ISO C.  This should be cleared up in the C++ Standard.

The reasoning was that for some architectures 'lines' might not be represented
as byte streams as on UNIX.  The ones it was intended for were record-
oriented systems like mainframes, but I guess that you might also consider
the CR/LF sequences on DOS etc too.  After all, on these systems the '\' is
immediately followed by <cr>, not <nl>.

I don't have my ancient C Standard draft to hand- does anyone have access
to it?  (Or, specifically, to the rational section)


Personally, I think that making trailing whitespace significant is a bad idea
and that conformant compilers SHOULD compress <whitespace_char>*<nl> to <nl>.

James








Author: ccwf@locke.klab.caltech.edu (Charles Fu)
Date: 1995/06/30
Raw View
In article <3sus63$3s6@fido.asd.sgi.com>,
Shankar Unni <shankar@engr.sgi.com> wrote:
>As to whether it's *reasonable* for a DOS (or Unix) compiler to strip out
>trailing blanks from what are clearly variable-length records, that's up
>the implementation. Personally, I'd call them both "broken".

But what if, as is the case with some versions of the WATCOM compiler, the
compiler runs under both DOS and (some versions of) UNIX?  I can see that the
implementor would want to have an end of line convention that would make the
files compile identically under under either OS.

-ccwf





Author: jla@to.icl.fi (Jari Laaksonen)
Date: 1995/06/28
Raw View
What is your interpretation of the ARM chapter 16.1 "Phases of Preprocessing"
where is said that "Each pair of a backslash character \ immediately followed
by a new-line is deleted, with the effect that the next source line is
appendedto the line that contained the sequence."

The keyword here is 'immediately' so that in the following examples the first
would be correct and the second false:

(characters <.> marking space or tab, characters <\n> marking new-line)

example 1:

// comment line 1 \<\n>
   comment line 2

during preprocessing _before_ comment elimination this would become as:

// comment line 1    comment line 2


example 2:

// comment line 1 \<.><.><.><\n>
   comment line 2

during preprocessing _after_ comment elimination this should leave
'comment line 2' as a source line.

However, for example both Watcom C/C++ 10.0a and IBM VisualAge C++ 3.0 Beta
treat both cases as a multiline comment.

Also, both compilers produce similar results from the following:

char s1[] = "a\<\n>
b\<\n>
c";

char s2[] = "a\<.><.><.><.><.><\n>
b\<\n>
c";

After preprocessing this becomes:

char s1[] = "abc";
char s2[] = "abc";


        // Albert
    email: jla@to.icl.fi
-----------=======================================






Author: clamage@Eng.Sun.COM (Steve Clamage)
Date: 1995/06/28
Raw View
In article 36k@louhi.to.icl.fi, jla@to.icl.fi (Jari Laaksonen) writes:
>What is your interpretation of the ARM chapter 16.1 "Phases of Preprocessing"
>where is said that "Each pair of a backslash character \ immediately followed
>by a new-line is deleted, with the effect that the next source line is
>appendedto the line that contained the sequence."
>
>The keyword here is 'immediately' so that in the following examples the first
>would be correct and the second false:

[ examples deleted ]

During preprocessing, blanks and other whitespace can be significant.
For example, the two directives
 #define a (b)
 #define a(b)
mean quite different things.

You are correct that "immediately" here means "with no intervening
characters, blank or not".

The draft standard has slightly different wording but the same meaning.
The rule is the same as in the C standard.

It sounds like you are saying that two compilers treat the sequence
 <backslash> <blank> <newline>
as if the <blank> were not present, and splice the two physical lines
into one logical line. If that is the case, they are probably wrong to
do so.

I say "probably" because a backslash that is not part of a literal string,
not part of a character constant, and not immediately followed by a newline,
has no assigned meaning in C++. I think it requires a diagnostic.

But I suppose a compiler could treat backslash followed by blanks and tabs
up to a newline the same as backslash-newline, as an extension. It won't
break any well-formed programs, and it avoids mysterious error messages
caused by invisible trailing blanks on a line.
---
Steve Clamage, stephen.clamage@eng.sun.com







Author: JdeBP@jba.co.uk (Jonathan de Boyne Pollard)
Date: 1995/06/28
Raw View
Steve Clamage (clamage@Eng.Sun.COM) wrote:
: It sounds like you are saying that two compilers treat the sequence
:  <backslash> <blank> <newline>
: as if the <blank> were not present, and splice the two physical lines
: into one logical line. If that is the case, they are probably wrong to
: do so.

I disagree.  The draft uses the phrase "end of line indicators" in phase 1
of translation.  Since the standard (rightly) doesn't say what, exactly
constitutes an end-of-line indicator in a source file, an implementation
can validly specify anything that it likes for one, and translate that to a
newline in phase 1.

In fact, you will find that the compilers that Jari referred to *are*, in
fact, using custom end-of-line sequences, since they can handle source
files that use the DOS&OS/2 convention of CR+LF to terminate a line.

I believe that it is equally as valid for an implementation to say "an end
of line indicator is an LF or CR+LF preceded by any amount of whitespace"
as it is to say "an end of line indicator is an LF or CR+LF".

( Compare this with the the strictures about what constitutes a "line" in
  stream file I/O. )

In other words, they are probably *right* to do so.





Author: wil@ittpub.nl (Wil Evers)
Date: 1995/06/29
Raw View
In article <3sqool$36k@louhi.to.icl.fi>  writes:
> What is your interpretation of the ARM chapter 16.1 "Phases of
Preprocessing"
> where is said that "Each pair of a backslash character \ immediately
followed
> by a new-line is deleted, with the effect that the next source line is
> appendedto the line that contained the sequence."
>
> The keyword here is 'immediately' so that in the following examples the
first
> would be correct and the second false:
>
> (characters <.> marking space or tab, characters <\n> marking new-line)
>
> example 1:
>
> // comment line 1 \<\n>
>    comment line 2
>
> during preprocessing _before_ comment elimination this would become as:
>
> // comment line 1    comment line 2
>
>
> example 2:
>
> // comment line 1 \<.><.><.><\n>
>    comment line 2
>
> during preprocessing _after_ comment elimination this should leave
> 'comment line 2' as a source line.
>
> However, for example both Watcom C/C++ 10.0a and IBM VisualAge C++ 3.0
Beta
> treat both cases as a multiline comment.
>
> Also, both compilers produce similar results from the following:
>
> char s1[] = "a\<\n>
> b\<\n>
> c";
>
> char s2[] = "a\<.><.><.><.><.><\n>
> b\<\n>
> c";
>
> After preprocessing this becomes:
>
> char s1[] = "abc";
> char s2[] = "abc";
>
>
>         // Albert
>     email: jla@to.icl.fi
> -----------=======================================

In you ask me, both the ARM and standard (2.1, [lex.phases]) say that line
splicing does not take place if there is whitespace between the backslash
and the newline character. In the preprocessing phase, line splicing takes
place before tokenization, which means that spaces can't be safely ignored
here. This can be a nuisance, as spaces at the end of a line are hard to
see in most editors.

- Wil