Vim as XML Editor: More Setup

The tools listed in this chapter are less basic/crucial than those in the previous chapter, and are optional for many users.

Ruby

Ruby is a very nice object-oriented programming language from Japan. Some scripts in this howto are written in Ruby so I recommend to install it. Alternatively you could translate the scripts to your favourite language.

home

www.ruby-lang.org/en/

Make sure that you have the latest version or 1.8
$ ruby -v
otherwise install it.

On Linux

The latest stable version of Ruby is available from various places, eg www.ruby-lang.org/en/downloads/.

After having downloaded and unpacked the archive read the README, under "How to compile and install".

Here's how I installed Ruby: First I installed readline-devel (I don't know if this was necessary since readline was installed already). Then I did the following (output of some commands is omitted):
$ mkdir del/compile/ruby
$ cd del/compile/ruby
$ wget [...snipped...].tar.gz
$ md5sum --check
5d52c7d0e6a6eb6e3bc68d77e794898e *ruby-1.8.1.tar.gz
ruby-1.8.1.tar.gz: OK
$ tar -xzf ruby-1.8.1.tar.gz
$ cd ruby-1.8.1/
$ mkdir /home/tobi/bulk/run/ruby
$ mkdir /home/tobi/bulk/run/ruby/1_8_1
$ autoconf
$ ./configure --prefix=/home/tobi/bulk/run/ruby/1_8_1
$ make
$ make test
test succeeded
$ make install
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/ruby/1_8_1/bin/ruby "$@"
.
w /home/tobi/data/commands/ruby_1.8.1
60
q
$ chmod 700 ~/data/commands/ruby_1.8.1
$ ruby_1.8.1 -v
ruby 1.8.1 (2003-12-25) [i686-linux]
$ ruby_1.8.1 test/runner.rb
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/ruby/1_8_1/bin/irb "$@"
.
w /home/tobi/data/commands/irb_1.8.1
59
q
$ chmod 700 ~/data/commands/irb_1.8.1
$ irb_1.8.1
irb(main):001:0> puts 6
6
=> nil
irb(main):002:0> puts 6
6
=> nil
With the Ruby that came with my distro, readline doesn't work; [up] results in ^[[A. With the Ruby I installed IRB works (although I have to hit [escape] before entering [up]).

On Windows

Sorry, there isn't any info regarding Windows.

XMLStarlet

From the web site:
"XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands."

On Linux

Here's the script that I use to install XMLStarlet:

install_xmlstar

#!/bin/bash -x

# This is just an example you could use as basis for your script.
# (do not run it without having revised and adjusted it)

# The --with-[...]-src paths must point to the libxml and libxslt
# sources.
# The sources are available after install_libxml finished, for
# example.
# Set the version numbers below.
# Be online, then do
# tobi ~/del $ ~/data/run/install_xmlstar

# this doesn't really make sense ...
av_command="antivir -rs -z"

my_home=/home/tobi

if [ ! $HOME == $my_home ]; then
  exit
fi
if [ `whoami` != 'tobi' ]; then
  exit
fi

# set these:
ver_xmlstar=0.8.1
ver_libxml=2.6.5
ver_libxslt=1.1.2

run_top=${HOME}/bulk/run/xmlstar
run=${run_top}/${ver_xmlstar}
compile=${HOME}/del/compile_libxml
command=${HOME}/data/commands/xmlstar

if [ -d $run ]; then
  echo ${run}' exists, exiting'
  exit
else
  if [ ! -d $run_top ]; then
    mkdir $run_top
  fi
  if [ ! -d $run ]; then
    mkdir $run
  fi
fi
cd $compile

######################################################################

url_xmlstar="[... snipped URL ...]\
xmlstarlet-${ver_xmlstar}.tar.gz"

file_xmlstar=`basename ${url_xmlstar}`

if [ ! -f download/$file_xmlstar ]; then
  cd download
  wget $url_xmlstar
  $av_command $file_xmlstar
  # if [ $? != 0 ]; then
  if [ $? -ne 0 ]; then
    exit
  fi
  cd ../
fi

tar -xzf download/${file_xmlstar}

cd xmlstarlet-${ver_xmlstar}
./configure --prefix=${run} \
  --with-libxml-src=${compile}/libxml2-${ver_libxml} \
  --with-libxslt-src=${compile}/libxslt-${ver_libxslt}
make
make tests
make install

######################################################################

# if [ ! -f $command ]; then
  cat > $command << EOF
#!/usr/bin/env sh
# may get overwritten
${run}/bin/xml "\$@"
EOF
  chmod 700 $command
# fi

xmlstar --version

On Windows

Installation is very simple. After having downloaded and unzipped XMLStarlet (xmlstarlet-version-win32.zip) I added the directory containing xml.exe to the system path. This makes the system path longer and requires a restart, but batch files support only up to nine arguments which often is not enough when using XMLStarlet. I think that xml is a confusingly generic name for a command so I renamed it to xmlstar by renaming xml.exe to xmlstar.exe.

Try it out

Caution

Whenever you filter your data through a tool it can get corrupted. If something went wrong you can use u to undo the filtering.

XMLStarlet can be used to remove all objects matching an XPath, eg all style attributes from an XHTML document. Paste the following into Vim:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>foo</title>
  </head>
  <body>
    <div style="text-align:center">
      <p id="foo" style="color:green" class="blammo">
        foo
      </p>
    </div>
  </body>
</html>
Then do
:%!xmlstar ed --delete //@style
You should get something like this:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>foo</title>
  </head>
  <body>
    <div>
      <p id="foo" class="blammo">
        foo
      </p>
    </div>
  </body>
</html>

Tidy

Sometimes I receive HTML files generated by Microsoft Word; Often they are very bloated. Tidy can make them around five times smaller, and can help with turning them into valid XHTML. The results can't be guaranteed to be really good code regarding semantics and structure, but the files become much easier to work with.

On Linux

Here's how I installed Tidy:
$ tidy -help
bash: tidy: command not found
$ cd bulk/run/
$ mkdir tidy && cd tidy
$ wget [... snipped URL ...].tgz
$ md5sum --check
476326c3d44292108111841a42bd27f6 *tidy_linux_x86.tgz
tidy_linux_x86.tgz: OK
$ tar -xzf tidy_linux_x86.tgz
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/tidy/bin/tidy "$@"
.
w /home/tobi/data/commands/tidy
54
q
$ chmod 700 ~/data/commands/tidy
$ tidy -v
HTML Tidy for Linux/x86 released on 1st November 2003
$

On Windows

A tidy.bat could look like this: (two lines)

tidy.bat

@echo off
\path\to\tidy.exe -config /path/to/tidyrc.txt
-f /log/errors/here/tidyerrs.txt %1 %2 %3 %4 %5 %6 %7 %8 %9
(put it in a directory which is on the system path)

Settings

Sample tidyrc.txt:
word-2000: yes
clean: yes
doctype: strict
bare: yes
drop-font-tags: yes
drop-proprietary-attributes: yes
enclose-block-text: yes
escape-cdata: yes
logical-emphasis: yes
output-xhtml: yes