Get our free book (in Spanish or English) on rainwater now - To Catch the Rain.

Appropedia:Admin tasks/Interwiki map

From Appropedia
Jump to: navigation, search

Automatic Updating of the Interwiki Table[edit]

The InterWiki Map charts interwiki link prefixes. All the prefixes are accessed by mediawiki via the interwiki table in the website's database. The script outlined in this page merely increases the accessibility to maintain a set of site wide interwiki links. It provides a means to edit the interwiki table using standard wiki markup without having to bother with or have any knowledge of mySQL.

The Script[edit]

The script is organized in three parts:

  • The bash script, "UpdateInterwiki.sh": This script parses the most current collection of interwiki links posted at Interwiki map, reformats the content to suit mySQL and save the result to a text file, then calls a seperate mySQL script to update the database.
  • The mySQL script, "TableUpload.sql": Refreshes the interwiki table with any new updates.
  • The command line in the cron file that excecutes the at whatever predetermined time.


I have chosen organize the scripts in their own folder in the maintenance directory.

 ~/appropedia.org/maintenance/Interwiki/

Bash Script[edit]

~/appropedia.org/maintenance/Interwiki/UpdateInterwiki.sh

#!/bin/bash
w3m -dump -cols 160 http://appropedia.org/Interwiki_map | gawk '/Please do not change/,/The wikis below are not part of the interwiki map/ { print }' | gawk '/http:/ { print $1 "\t" $2 }' >~/appropedia.org/maintenance/interwiki/interwikitable.txt
mysql -u***** -p***** -h mysql.appropedia.org appropedia <~/appropedia.org/maintenance/interwiki/TableUpload.sql >tableupload.log;

Line-by-line breakdown:

  1. designates a bash script
  2. Calls w3m, a text based web browser, to read the Interwiki map web page and pipes the content to gawk, a text stream editor. This first gawk command isolates the web page text we are interested in, between "Please do not change" and "The wikis below are not part of the interwiki map", and pipes that content to a second gawk command. The second gawk command takes every line that contains "http:" and prints the prefix, followed its corresponding address, seperated/delimited by a tab space. The result is a properly formatted collection of table data ready to be called into interwiki table, which is saved to a file titled, "interwikitable.txt".
  3. Temporarily opens our database in the mysql shell and executes the "TableUpload.sql" mysql script. Then writes errors, if any, to the "tableupload.log" file.

mySQL Subscript[edit]

~/appropedia.org/maintenance/interwiki/TableUpload.sql

DELETE FROM w1interwiki;
LOAD DATA LOCAL INFILE '/home/lonny1/appropedia.org/maintenance/Interwiki/interwikitable.txt' INTO TABLE w1interwiki;

Line-by-line breakdown:

  1. Clears the interwiki table.
  2. Loads the properly formatted contents of the "interwikitable.txt" file into the interwiki table.

crontab[edit]

~/appropedia.org/maintenance/interwiki/UpdateInterwiki.sh

Set the execution frequency to whatever seems appropriate. We have it run once daily.

Simplified Installation and Setup[edit]

  1. Download the generic scripts here: Media:Interwiki.zip
  2. Open the generic script files with a text editor and customize them. For the customization you have to know your:
    • mySQL host name along with its corresponding username and password
    • the database name that mediawiki was installed on.
  3. Create an new wiki page called Interwiki_map.
    • Ours is here: Interwiki_map
    • If you want, copy some of our wiki code to get started, it's included in the download as: InterwikiMapPageCode.txt
  4. Edit the two script files, replacing the generic lines to your site specific information:
    • TableUpload.sql
    • UpdateInterwiki.sh
      • Please note that this script scans for interwiki links between a set of strings on the Interwiki map page so as not to include unwanted content. In our page these strings frame the content we want: "Please do not change" to "The wikis below are not part of the interwiki map". If you create your own Interwiki map page, make sure and adjust the appropriate command in the script.
        • | gawk '/Please do not change/,/The wikis below are not part of the interwiki map/ { print }' |
  5. From SSH or FTP, create a new directory for the scripts called "interwiki":
    • mkdir ~/YOUR_MEDIAWIKI_INSTALLATION_DIRECTORY/maintenance/interwiki
  6. Upload the two script files to the newly created directory:
  7. Change the permissions on both script files:
    • chmod 755 FILE.NAME
  8. Run the update script to test it out:
    • ./UpdateInterwiki.sh
  9. Check the newly created interwikitable.txt to see if the content translated.
  10. Check the funcionality of interwiki links on your wiki, using a prefix and a page that you know works (like from wikipedia).

Run the Update script whenever you have made changes to the table on your Interwiki Map page. Or, schedule the update script to run periodically in your cron settings.

Disclaimer[edit]

This script is kinda cute. But really silly for the following reasons:

  1. It's ridiculous that I am using a web browser to pull content from our own database. This script would be way more slick if it didn't take such a circuitous scenic drive to the the data. I just don't know a lick of python; So, I don't have a clue about the specific methods wikimedia uses to pull content from the database directly in order to build the site's web pages. Not that python would be needed in this script -- it would just help me figure out where specifically the Interwiki_Map page goes in the database to get its content.
  2. It's ridiculous that this script is run every single day when months usually go by between changes to the Interwiki Map. The redundancy bugs me -- not that it really matters much. But there must be some page specific hook script or something that would trigger this update script, only at times when the Interwiki Map page has been edited.


It's crude; but it's simple and it works.

If anyone out there improves on this script for their own site, please show me what you changed. Or, if you simply know what I could do instead, and are able to communicate it, let me know. It would greatly satisfy me to make this better.


Table Format Conversions[edit]

All commands are written for the unix environment

White Space to Tab Delimited[edit]

Population of the Interwiki table requres specific formatting, called out in the mysql command, LOAD DATA INTO TABLE. The script used for appropedia.org uses mysql's default format, tab delimits per line of entry.

gawk '/ http:/ { print $1 "\t" $2 }' file.in >file.out

This command line requres that each Interwiki table entry is on its own line and that the prefix is seperated by its address by some amount of white space. Also, that the prefix is the first entry and the address is the second. This format is how Meatball Wiki provides their InterMap textfile.

White Space to Wiki Code[edit]

If you need to do the reverse, copy the contents from the interwiki table (tab or white space delimited) and convert it to a table in wiki code, use this:

gawk 'BEGIN { print "\{| class\=\"plainlinks\"" } ; { print "| " $1 " || " $2 "\n|-" } ; END { print "|\}" }' file.in >file.out

This command requres input similar to what is produced by the command 'White Space to Tab Delimited' above. Similar meaning, tabs and/or multiple spaces (any white space).


Included by Mediawiki[edit]

FYI: Mediawiki installation includes a few collections of interwiki prefixes in the maintenance folder.

The files Mediawiki includes are:

interwiki.sql
wikipedia-interwiki.sql
wiktionary-interwiki.sql

If you're content with the content of any of these collections and you have no need to regularly maintain your own collection, it's fairly simple to call them into your interwiki table. They need to be slightly edited first.

Editing Included Interwiki files[edit]

Everything above the line

REPLACE INTO /*$wgDBprefix*/interwiki (iw_prefix,iw_url,iw_local) VALUES

needs to be deleted. And your own interwiki table name needs to replace theirs. In appropedia's case:

REPLACE INTO /*$wgDBprefix*/w1interwiki (iw_prefix,iw_url,iw_local) VALUES