
By
Walter
Alan Zintz
.
This installment of our Vi/Ex tutorial series is a diversion
from the subjects I promised at the end of the previous part --
the change is my fault, and yet it is necessary. When I blithely
suggested last time that the
R
command is just like the familiar
r
command, except for a few differences I mentioned, I was leading
you astray.
There are several differences that can cause problems in
certain uses unless you understand those differences. And you
won't really comprehend the greatest of those differences
until
you know about metacharacters in insert mode.
But as an encouragement to follow all this, consider that almost
all of what I say here about the
R
command also is valid with all the other commands that put you
into text insertion mode:
a A i I o O c s
:a :i
etcetera.
There's more to
R
than
to
r
The
r
command replaces whatever
character is presently under the cursor, so there must be some
character under the cursor for it to replace -- otherwise it just
gives you an error beep. Not so with
R
.
You can give the
R
command
on an empty line; whatever you type after that, up to the
next escape character, will take the place of that empty line
just as though you had typed past the
end of an existing line after giving an
R
command. (I was going to say ``just as though you had given an
a
command'', but I'm now very leary of
making comparisons that are incomplete without paragraphs of
explanations.) You can even start entering text into a brand-new
file via the
R
command.
The factor above can be useful in various situations; I only have
space to mention one. At times I want to type new characters to
replace blank spaces in a place where some of the lines are empty.
These do not have any blanks; no characters at all. But I do not have
to look at each line before I start typing on it, to see whether I
should use an
R
or an
a
command, because
R
will work in either case.
The
R
command is more forgiving of
your typing errors, too. Whatever character you type after an
r
is final. If you accidentally
type
d the wrong character, you can only put back what was there
by typing a
u
command, if the mistake
was the last editing command you typed, or put in the replacement
you had in mind by returning the cursor to the spot and running
another, more careful,
r
command.
But if you mistype during an
R
command, you can backspace over the error with the backspace key.
Then you can type in the character (or characters; you can back
up multiple spaces by repeating the backspace key) you should have
typed. And if you simply typed too far, you'll be glad to know
that backspacing doesn't just remove the incorrect characters, it
restores the characters that were there, either right away or as
soon as you hit the escape key. You can even backspace over
everything you've typed during this
R
command before you type escape, because the editor does not object
to a replacement string length of zero.
One caveat here, though, lest my clarification turn out to need
a clarification of its own. With either of these commands it is
possible to break a line, just by typing the return key as a
replacement character, and with the
R
command this linebreaking can be done either while actually replacing
characters or when typing on beyond the end of the existing line.
With almost all versions of the editor, it is not possible to
backspace over an inserted linebreak, even while you are still in
R
insertion mode.
The most important difference, though, is the handling of
metacharacters. Yes, text insertion utilizes metacharacters too,
quite apart from the ones that the replacement patterns in
:substitute
commands use. The
r
command recognizes hardly any of
these metacharacters, and quoting those in as literal characters is
very simple. The
R
command, though,
recognizes almost all of them, and quoting characters in with
R
is rather complicated.
Quoting in Characters
The phrase ``quoting in'' is standard terminology, but it is
rather misleading in the editor. Unlike Unix shells, the editor
does not use any of the ASCII quotation marks:
` '
"
(backquote, single and double quote) to quote
characters into a file. Instead, it uses the backslash
(``
\
'') and control-V
(``
^V
''); the latter is what you
send when you press the V key while holding the CONTROL or CTRL
key down. In either case, you quote a character in by typing the
quoting character just prior to the character you want to quote
in. So if @ is your line kill character, and you want to put
that character in the text you are typing in, you would have to
type either \@ or ^V@ to get it there. And if you want several
consecutive characters quoted in, yo
u must quote each of them
individually. That is, if you want to put @@@ into a line, you
must type either ^V@^V@^V@ or \@\@\@ to put that string there.
But \ and ^V are not always interchangeable. In many cases
either will work; but sometimes you must choose the right one.
Which one to use depends both on what character you want to quote in
and whether you're using the
r
or
R
command.
One obvious use for quoting is to insert a character that normally
erases part or all of what you've just typed in. The ASCII backspace
character, control-H, must be quoted in, and so must your own line-kill
character (@ in the example above) and your own erase character if it
is not control-H. With the
r
command
you quote in any of these with a backslash; when using
R
you may quote any of these in using
either backslash or control-V.
A pause here, to answer a question that might be
in the minds
of people who know a little about Unix internals. Ordinarily it
is the asynchronous serial terminal line (or TTY) driver that
recognizes the erase and line-kill characters and edits the input
line accordingly without including these characters in the final
result. Then, how can one enter these same input-line characters
into the edit buffer if they don't get past the TTY driver?
Because Vi/Ex places the TTY driver into a special ``raw'' mode
that ignores the line-editing characters passing them on to the
editor. Otherwise you would not be able to quote these
characters in. Also, the editor is set up to discover your erase
and line-kill characters by querying your personal environment,
and then interpret these characters as the line driver would
have. A nifty feature -- but unfortunately, the editor has no
way to let the user turn this feature off.
The editor's creators
came up with a
curious method for repeating short text insertions, where the text
to go in is a
lways the same but any outgoing text varies. They decided
that when you are in screen mode, and have just gone into typing-in-text
submode, and make Control-@ (``^@'') the first character you type in,
then the editor should insert the last piece of text you had previously
inserted (if it was not more than 128 characters long) and take you back
to command mode. Unfortunately, they never made this work as promised.
In actuality, ^@ operates anywhere in a text insertion, not just
in the first character position. What a ^@ does there depends on the
situation. If your last
c d y
command,
or one of their variants such as
s D
etcetera, removed or copied a full line of text or parts of two or
more lines, or if you haven't run one of those commands in your
current editing session, then typing ^@ is just a nuisance. It will
take you out of text input submode and probably move the cursor back
a few characters from where the input ended.
But
if you have done at least one
c d y
command or a variant, and if the very last one you did removed or
copied only a part of a single line of text, then surprise! Typing a
^@ in this case will do three things:
- Unless you typed it at the first character position on a line,
it will move the cursor back one character. This will move over the
last character you typed in if you've typed any, or over one existing
character if you type ^@ as the first character of your insertion,
but will not erase the character it passes over.
- Just to the left of the new cursor position, the editor will
insert the text that was removed or copied by your last
c d y
command or variant.
(If you went into text-insertion submode via a
c
command or a variant of it, the text
you just took out is what will be put back in.)
- Finally, the text insertion will automatically end and you will
be back in command submode, w
ith the cursor positioned at the start
of the last simple word that was inserted by the ^@ metacharacter.
Quoting a ^@ into your text isn't possible, because the editor
reserves that character for internal use and will not accept it as
itself in any file you may edit. Not that there would be any reason
to put ^@ in a file anyway: it is the ASCII character NUL, a padding
character that is routinely inserted in data streams by device
drivers, and just as routinely stripped at the receiving end, so
any ^@ characters you might add would be lost in the shuffle.
But when you are using the
R
command,
or any other command that lets you insert an indefinite amount of
text, you can quote a ^@ anyway by preceding it with a ^V. The
result will be to quote ^[Pb into your file at that point; this
being the command string the editor issues to perform the odd
operation I've detailed above.
Those of you who are skillful with the editor may wonder why
the ^@ insertion ope
rates only when your last text extraction was
a fragment of one line. After all, the
P
command by itself inserts the contents of the unnamed buffer, and
that buffer holds whatever was extracted last, be it half a line or
a hundred lines, doesn't it? The answer lies in one of the editor's
undocumented features. When you give a command to insert text, even
the
r
command that only inserts a
single character, the editor simultaneously flushes the unnamed
buffer and leaves it empty -- if and only if that buffer contained
more than a fragment of one line. So, when you entered the text
insertion mode from which ^@ operates, you emptied the unnamed
buffer unless there was only a fragment of one line in it.
At times you may want to use the
beautify
option to
the
set
command. This tells the
editor to throw away most, but not all, control characters you may
try to type in -- the exceptions usually are the ta
b (^I),
newline (^J), and form feed (^L) -- in order to keep you from
inadvertently putting in invisible control characters that will
be hard to detect later. This option is normally off, but you
can type
:se bf
to turn it on.
But even when you want most control characters thrown out, there
will be occasions when one must go in. This is not possible using a
r
command. The usual
r
technique of backslashing will
usually bite
back in this case -- the editor will interpret the control character
by acting on its control meaning rather than inserting it in the text.
Using
R
, though, you can
insert most control characters by preceding each with ^V.
Even this may not be enough. Some systems are set up so that
when certain control characters are typed in, even though preceded by
^V, the system acts on them as control characters before the editor
ever sees them. To get around this
problem, many implementations of
the editor, especially older ones, interpret an ordinary character
typed right after a ^V as a control character. That is, on these
systems, typing ^VF or ^Vf while running an
R
command inserts a ^F in the file,
just as typing ^V^F would on systems that don't have this challenge.
Readers Ask
Here are the latest questions, and my solutions, from inquiring
readers with problems you might face someday.
Tommy Spratlin writes:
Hi Walter,
In moving
files from Windows machines to UNIX, some of our users do binary
transfers which result in ^M characters in the ASCII files. Usually
they occur at the ends of individual lines and I do:
:1,$ s/^M//g
where ^M is generated by ^V^M and everything works fine to
delete these characters. I now have a new problem: I found a
file with ^M characters embedded in it, but the file is
one long
line. I need to replace them with Vi's line-end character to
split this long line into multiple lines. But I can't because
it's the same as pressing the ENTER or RETURN key in the middle
of the substitution command. How can I replace the superfluous
carriage return? We have several files like this and it's
causing problems viewing them with Web browsers.
I tried substituting a newline with the character code and the
octal code unsuccessfully, and tried the ^M as a last
unsuccessful resort.
Things aren't as complicated as you make them seem, Tommy. First
of all, Web browsers generally ignore carriage-return and/or linefeed
characters while formatting text for display. If your browser is
choking on these all-one-line files, it is probably because the lines
are too long for your browser, or for some other cause not related to
embedded ^M characters.
Now, as you have deduced, the difference between Microsoft and
Unix text file formats is that Microsoft operat
ing systems seem
to favor carriage-return followed by linefeed (^J) as the line
separator, while Unix systems use linefeed alone.
As you've discovered, you cannot directly quote a ^J into any
editor command. And yet, you put a ^J into your file every time
you hit return during text entry, although the return key on most
terminals sends a ^M character. That's the trick; the
substitute
command regards a ^M in
the input pattern as a signal to insert a ^J and discard
the ^M. So you only need to get that ^M into the replacement
pattern by typing in your command line like this:
:1,$ s/^V^M/^V^M/g
You just have to overlook the appearance of futility in this
command line, as though it were going to replace each ^M with
itself. That first ^M is in the outgoing pattern, so it matches
a real ^M. The second, in the replacement pattern, calls for a ^J
as I explained above.
However, these all-one-line files may be too long for the Vi
e
ditor, which cannot handle lines much more than a thousand
characters long in most common implementations, with shorter
limits in older versions. The editor will truncate lines that
exceed the limit, with only a minimal and rather cryptic warning.
In such cases, use the
tr
utility
to replace the ^M characters (which is a very straightforward job
with that tool), before you bring the file into the Vi
editor.
You may wonder then, how you would use the
substitute
command to put ^M
characters into your file. The answer is to backslash the
quoted-in ^M. To add a ^M at the end of every line in your file,
so as to conform it to Microsoft practice, type this command:
:%s/$/\^V^M
(Note that it is important to type the \ first, then the ^V,
followed by the ^M.) The ^V puts the immediately-following ^M
into the command line, and the backslash tells the command that
this ^M is to be considered a real one, not a me
tacharacter for
^J. In fact, these are the general principles for quoting
characters almost everywhere except in typing-in-text mode:
- Precede a character by ^V to keep that character from being
interpreted as a metacharacter at the moment you type it. In
this case, you don't want typing ^M to immediately end the
substitution command.
- Precede a character by a backslash to keep that character
from acting as a metacharacter later, when what you've typed is
interpreted by the editor -- for example, when what you have
typed in is run as a command, or interpreted as a search
pattern. This command uses a backslash to keep the command from
inserting ^J instead of ^M at the time it executes.
- When you must use both, as in this case, type the backslash
before you type the ^V. (If you think that this backslash would
then affect the immediately following ^V rather than the later
^M, remember that the ^V is not there when the backslash takes
effect. The ^V disappears as soon as it tell
s the editor to
insert the ^M in the command instead of taking the ^M as the
signal to end the command.)
Finally, you can replace linefeed characters with something
else via line mode commands, but you must use two commands and
only one of them is the
substitute
command. Suppose you need to change a short file's format from a
number of lines to the format Tommy encountered: a single line with
^M separators. That is, replace each ^J (except the last) with a
^M. (This had better be a fairly short file, because even newer
versions of the editor can't handle any lines longer than 1024
characters.)
Start by using a command similar to the one above to put
^M at the end of every line except the last. (Since these ^M
characters are to separate lines, there's no use for one at the
end of the last line.) Then use this command:
:%j!
to join all the lines into one. The ``j'' in this command line is
the shortest abbreviation
for the line mode
join
command, and the ``!'' switch at
the end of it tells the command not to insert blank space between
the lines it joins.
Thai-Nghia Dinh writes:
Hi,
I have a question (rather simple, really) but no one seem able
to know the answer. Not even the help desk (with all the Vi
gurus :) ). I'm hoping you can help me with it.
I have a text file of unknown length. Each line of the file
can be very short or very long (from 3 characters up to 1000
characters).
Within this file, I'm trying to locate (search) the
n
th
occurrence of a word.
Here are a few things I've tried:
- The simple solution would be (from visual command mode):
a
/foobar
command followed by
the
n
command typed
n
-1
times. But what if
n
is large, say 200 or
greater?)
:1,$ global /^/ /foobar/
(and it
s variations)
Nothing useful...
Can you suggest a better way?
Yes, although it involves a slightly tricky procedure. Consider
the following command string:
:$|/\<foobar\>/s//QQQ
The first command in this string takes us to the last line of
our file and -- incidentally -- displays it on our screen, which is
not important here. The second command searches forward for a line
containing ``foobar'' as a word, and starting from the last line the
search must wrap around and find the first instance in the file.
Then that second command replaces the word ``foobar'' with ``QQQ'',
leaving the cursor at the point where the substitution was made.
Now let us make an addition to the start of this command string:
:1,199g/^/$|/\<foobar\>/s//QQQ
This revised string repeats the procedure 199 times; each time the
first instance of ``foobar'' remaining in the file is the one replaced.
So we en
d up sitting on the ``QQQ'' string that replaced the 199th instance
of ``foobar''; simply typing
n
will bring
us to the 200th instance. And if we move off that 200th instance for
any reason, going to the top of the file and searching for ``foobar''
will bring us right back to it, because the first 199 are now gone.
When we are finished with that 200th ``foobar'', this command:
:%s/QQQ/foobar/g
will change those 199 ``QQQ'' strings back to ``foobar''. Of course,
if there is any chance that ``QQQ'' might occur in the document as
itself, we can choose another dummy string.
And while I'm at it, I've got another question.
How do I delete all lines beginning with a certain string,
say, !@#$ (or foobar for that matter). And a related question:
how to delete lines containing the word foobar (anywhere within
the line)?
The first command line following will solve your first problem, and
th
e second will solve your second:
:g/^foobar/d
:g/\<foobar\>/d
Next Time Around
To make room to answer two readers' questions, I had to skip
presenting three great Vi tools --
autoindent
,
abbreviate
, and
map!
-- and the effect their
metacharacters have in text-insertion mode. They'll be first up
in the next part of this tutorial.
More answers to reader questions are coming, too. I have
queries to answer about the semicolon address separator and about
yanking within macros -- and if a few more significant problems
arrive here, I'll try to fit them in, too.
And this time you won't have to wait and wait for the next
tutorial part. As I write this paragraph, I'm already in the
middle of creating the next part, so you should see it within
a month after this part appears online.
|