Automatic Updating of the Interwiki Table

The InterWiki Map charts interwiki link prefixes. All the prefixes are accessed by mediawiki via the interwiki table in the website's database. The script outlined in this page merely increases the accessibility to maintain a set of site wide interwiki links. It provides a means to edit the interwiki table using standard wiki markup without having to bother with or have any knowledge of mySQL.

The Script

The script is organized in three parts:

  • The bash script, "UpdateInterwiki.sh": This script parses the most current collection of interwiki links posted at Interwiki map, reformats the content to suit mySQL and save the result to a text file, then calls a seperate mySQL script to update the database.
  • The mySQL script, "TableUpload.sql": Refreshes the interwiki table with any new updates.
  • The command line in the cron file that excecutes the at whatever predetermined time.


I have chosen organize the scripts in their own folder in the maintenance directory.

 ~/appropedia.org/maintenance/Interwiki/

Bash Script

~/appropedia.org/maintenance/Interwiki/UpdateInterwiki.sh

#!/bin/bash
w3m -dump -cols 160 http://appropedia.org/Interwiki_map | gawk '/Please do not change/,/The wikis below are not part of the interwiki map/ { print }' | gawk '/http:/ { print $1 "\t" $2 }' >~/appropedia.org/maintenance/interwiki/interwikitable.txt
mysql -u***** -p***** -h mysql.appropedia.org appropedia <~/appropedia.org/maintenance/interwiki/TableUpload.sql >tableupload.log;

Line-by-line breakdown:

  1. designates a bash script
  2. Calls w3m, a text based web browser, to read the Interwiki map web page and pipes the content to gawk, a text stream editor. This first gawk command isolates the web page text we are interested in, between "Please do not change" and "The wikis below are not part of the interwiki map", and pipes that content to a second gawk command. The second gawk command takes every line that contains "http:" and prints the prefix, followed its corresponding address, seperated/delimited by a tab space. The result is a properly formatted collection of table data ready to be called into interwiki table, which is saved to a file titled, "interwikitable.txt".
  3. Temporarily opens our database in the mysql shell and executes the "TableUpload.sql" mysql script. Then writes errors, if any, to the "tableupload.log" file.

mySQL Subscript

~/appropedia.org/maintenance/interwiki/TableUpload.sql

DELETE FROM w1interwiki;
LOAD DATA LOCAL INFILE '/home/lonny1/appropedia.org/maintenance/Interwiki/interwikitable.txt' INTO TABLE w1interwiki;

Line-by-line breakdown:

  1. Clears the interwiki table.
  2. Loads the properly formatted contents of the "interwikitable.txt" file into the interwiki table.

crontab

~/appropedia.org/maintenance/interwiki/UpdateInterwiki.sh

Set the execution frequency to whatever seems appropriate. We have it run once daily.

Disclaimer

This script is kinda cute. But really silly for the following reasons:

  1. It's ridiculous that I am using a web browser to pull content from our own database. This script would way more slick if it didn't take such a circuitous scenic drive to the the data. I just don't know a lick of python. So, I don't have a clue about the specific methods wikimedia uses to pull content from the database directly in order to build the site's web pages. Not that python would be needed in this script -- it would just help me figure out where specifically the Interwiki_Map page goes in the database to get its content.
  2. It's ridiculous that this script is run every single day when months usually go by between changes to the Interwiki Map. The redundancy bugs me -- not that it really matters much. But there must be some page specific hook script or something that would trigger this update script, only at times when the Interwiki Map page has been edited.


It's crude; but it's simple and it works.

If anyone out there improves on this script for their own site, please show me what you changed. Or, if you simply know what I could do instead, and are able to communicate it, let me know. I would greatly satisfy me to make this better.


Table Format Conversions

All commands are written for the unix environment

White Space to Tab Delimited

Population of the Interwiki table requres specific formatting, called out in the mysql command, LOAD DATA INTO TABLE. The script used for appropedia.org uses mysql's default format, tab delimits per line of entry.

gawk '/ http:/ { print $1 "\t" $2 }' file.in >file.out

This command line requres that each Interwiki table entry is on its own line and that the prefix is seperated by its address by some amount of white space. Also, that the prefix is the first entry and the address is the second. This format is how Meatball Wiki provides their InterMap textfile.

White Space to Wiki Code

If you need to do the reverse, copy the contents from the interwiki table (tab or white space delimited) and convert it to a table in wiki code, use this:

gawk 'BEGIN { print "\{| class\=\"plainlinks\"" } ; { print "| " $1 " || " $2 "\n|-" } ; END { print "|\}" }' file.in >file.out

This command requres input similar to what is produced by the command 'White Space to Tab Delimited' above. Similar meaning, tabs and/or multiple spaces (any white space).


Included by Mediawiki

FYI: Mediawiki installation includes a few collections of interwiki prefixes in the maintenance folder.

The files Mediawiki includes are:

interwiki.sql
wikipedia-interwiki.sql
wiktionary-interwiki.sql

If you're content with the content of any of these collections and you have no need to regularly maintain your own collection, it's fairly simple to call them into your interwiki table. They need to be slightly edited first.

Editing Included Interwiki files

Everything above the line

REPLACE INTO /*$wgDBprefix*/interwiki (iw_prefix,iw_url,iw_local) VALUES

needs to be deleted. And your own interwiki table name needs to replace theirs. In appropedia's case:

REPLACE INTO /*$wgDBprefix*/w1interwiki (iw_prefix,iw_url,iw_local) VALUES
Cookies help us deliver our services. By using our services, you agree to our use of cookies.