(Complete Revision)
Line 1: Line 1:
==Initial Population==
==Automatic Updating of the Interwiki Table==
===Meatball Wiki===
The InterWiki Map charts interwiki link prefixes. All the prefixes are accessed by mediawiki via the interwiki table in the website's database. The script outlined in this page merely increases the accessibility to maintain a set of site wide interwiki links. It provides a means to edit the interwiki table using standard wiki markup without having to bother with or have any knowledge of mySQL.
MediaWiki looks to Meatball Wiki for the basic, current Interwiki mapping list. Find the current info here, [http://www.usemod.com/cgi-bin/mb.pl?InterMapTxt http://www.usemod.com/cgi-bin/mb.pl?InterMapTxt]
Download the text file here, [http://usemod.com/intermap.txt http://usemod.com/intermap.txt]
Use the format conversion command found [[#Table Format Conversions|below]] and past into the [[Interwiki map]] edit page.


===Included by Mediawiki===
==The Script==
Additionally, Mediawiki includes a couple files in the maintenance folder. However, they contain only the short prefixes. Also, they need to be edited slightly before calling them into the mySQL interwiki table.
The script is organized in three parts:
*The bash script, "UpdateInterwiki.sh": This script parses the most current collection of interwiki links posted at [[Interwiki map]], reformats the content to suit mySQL and save the result to a text file, then calls a seperate mySQL script to update the database.
*The mySQL script, "TableUpload.sql": Refreshes the interwiki table with any new updates.
*The command line in the cron file that excecutes the at whatever predetermined time.


====Editing Included Interwiki files====
Everything above the line
REPLACE INTO /*$wgDBprefix*/interwiki (iw_prefix,iw_url,iw_local) VALUES
needs to be deleted. And our own interwiki table name needs to replace theirs, making it,
REPLACE INTO /*$wgDBprefix*/w1interwiki (iw_prefix,iw_url,iw_local) VALUES


The files Mediawiki includes are:
I have chosen organize the scripts in their own folder in the maintenance directory.
:interwiki.sql
  ~/appropedia.org/maintenance/Interwiki/
:wikipedia-interwiki.sql
 
:wiktionary-interwiki.sql
===Bash Script===
~/appropedia.org/maintenance/Interwiki/UpdateInterwiki.sh
#!/bin/bash
w3m -dump -cols 160 http://appropedia.org/Interwiki_map | gawk '/Please do not change/,/The wikis below are not part of the interwiki map/ { print }' | gawk '/http:/ { print $1 "\t" $2 }' >~/appropedia.org/maintenance/interwiki/interwikitable.txt
mysql -u***** -p***** -h mysql.appropedia.org appropedia <~/appropedia.org/maintenance/interwiki/TableUpload.sql >tableupload.log;


If any of these files are included in the [[#Update Script|script]] oulined below, you have to re edit them whenever a [[Appropedia:Admin_Tasks/Upgrade_checklist|Mediawiki Version Upgrade]] is conducted, because they get overwritten in the process.
Line-by-line breakdown:
#designates a bash script
#Calls w3m, a text based web browser, to read the Interwiki map web page and pipes the content to gawk, a text stream editor. This first gawk command isolates the web page text we are interested in, between "Please do not change" and "The wikis below are not part of the interwiki map", and pipes that content to a second gawk command. The second gawk command takes every line that contains "http:" and prints the prefix, followed its corresponding address, seperated/delimited by a tab space. The result is a properly formatted collection of table data ready to be called into interwiki table, which is saved to a file titled, "interwikitable.txt".
#Temporarily opens our database in the mysql shell and executes the "TableUpload.sql" mysql script. Then writes errors, if any, to the "tableupload.log" file.


I have only included wikipedia-interwiki.sql into the script because it contains all the links to wikipedia's foreign language pages. I didn't include interwiki.sql because the Meatball version is better. And, I didn't include the wiktionary-interwiki.sql because it's contents seemed unnecessary at this time. We may want to include it into the script in the future. You can do so by adding this line to the end of the script
===mySQL Subscript===
  mysql -u**** -p**** -h mysql.whatissustainability.org appropedia </home/lonny1/appropedia.org/maintenance/wiktionary-interwiki.sql;
~/appropedia.org/maintenance/interwiki/TableUpload.sql
just under the line to add wikipedia-interwiki.sql.
  DELETE FROM w1interwiki;
LOAD DATA LOCAL INFILE '/home/lonny1/appropedia.org/maintenance/Interwiki/interwikitable.txt' INTO TABLE w1interwiki;


==Update Script==
Line-by-line breakdown:
This script updates the Interwiki tables in our database by parsing the most current tables posted at [[Interwiki map]].
#Clears the interwiki table.
#Loads the properly formatted contents of the "interwikitable.txt" file into the interwiki table.


The script lives here:
===crontab===
  ~/appropedia.org/maintenance/Interwiki/UpdateInterwiki.sh
  ~/appropedia.org/maintenance/interwiki/UpdateInterwiki.sh


Its contents:
Set the execution frequency to whatever seems appropriate. We have it run once daily.
#!/bin/bash
# Grab interwiki map from website
lynx -dump -nolist -width=160 http://appropedia.org/Interwiki_map | gawk '/ http:/ { print $1 "\t" $2 }' >~/appropedia.org/maintenance/Interwiki/interwikitable.txt
# Run mySQL batch script using batchfile TableUpload.sql. An empty tableupload.log should mean no errors.
mysql -u**** -p**** -h mysql.whatissustainability.org appropedia </home/lonny1/appropedia.org/maintenance/Interwiki/TableUpload.sql >tableupload.log;
# Run mediawiki included mySQL batch scripts for the standard Wikimedia Interwiki prefixes
# Make sure these have been modified to our database naming scheme and have had the header notes removed
mysql -u**** -p**** -h mysql.whatissustainability.org appropedia </home/lonny1/appropedia.org/maintenance/wikipedia-interwiki.sql;


===mySQL Subscript===
==Disclaimer==
The script lives here:
This script is kinda cute. But really silly for the following reasons:
~/appropedia.org/maintenance/Interwiki/UpdateInterwiki.sh
#It's ridiculous that I am using a web browser to pull content from our own database. This script would way more slick if it didn't take such a circuitous scenic drive to the the data. I just don't know a lick of python. So, I don't have a clue about the specific methods wikimedia uses to pull content from the database directly in order to build the site's web pages. Not that python would be needed in this script -- it would just help me figure out where specifically the Interwiki_Map page goes in the database to get its content.
#It's ridiculous that this script is run every single day when months usually go by between changes to the Interwiki Map. The redundancy bugs me -- not that it really matters much. But there must be some page specific hook script or something that would trigger this update script, only at times when the Interwiki Map page has been edited.  


Its Contents:
DELETE FROM w1interwiki;
LOAD DATA LOCAL INFILE '/home/lonny1/appropedia.org/maintenance/Interwiki/interwikitable.txt' INTO TABLE w1interwiki;


===crontab===
It's crude; but it's simple and it works.
Currently this script is run daily to keep our interwiki table current. It's excessive because it's unlikely that our [[Interwiki table]] is being changed that often. A more ideal solution would be to have a page-change hook script activate this update script. Until then, here's the crontab command


/home/lonny1/appropedia.org/maintenance/Interwiki/UpdateInterwiki.sh
If anyone out there improves on this script for their own site, please show me what you changed. Or, if you simply know what I could do instead, and are able to communicate it, let me know. I would greatly satisfy me to make this better.


Make sure to add the desired frequency info to the front.


==Table Format Conversions==
==Table Format Conversions==
Line 67: Line 59:


===White Space to Wiki Code===
===White Space to Wiki Code===
A quick way to write an Interwiki table in Wiki code, correctly formatted for our script to parse.
If you need to do the reverse, copy the contents from the interwiki table (tab or white space delimited) and convert it to a table in wiki code, use this:


  gawk 'BEGIN { print "\{| class\=\"plainlinks\"" } ; { print "| " $1 " || " $2 "\n|-" } ; END { print "|\}" }' file.in >file.out
  gawk 'BEGIN { print "\{| class\=\"plainlinks\"" } ; { print "| " $1 " || " $2 "\n|-" } ; END { print "|\}" }' file.in >file.out


This command requres input similar to what is produced by the command 'White Space to Tab Delimited' above. Similar meaning, tabs and/or multiple spaces (any white space).
This command requres input similar to what is produced by the command 'White Space to Tab Delimited' above. Similar meaning, tabs and/or multiple spaces (any white space).
==Included by Mediawiki==
FYI: Mediawiki installation includes a few collections of interwiki prefixes in the maintenance folder.
The files Mediawiki includes are:
:interwiki.sql
:wikipedia-interwiki.sql
:wiktionary-interwiki.sql
If you're content with the content of any of these collections and you have no need to regularly maintain your own collection, it's fairly simple to call them into your interwiki table. They need to be slightly edited first.
===Editing Included Interwiki files===
Everything above the line
REPLACE INTO /*$wgDBprefix*/interwiki (iw_prefix,iw_url,iw_local) VALUES
needs to be deleted. And your own interwiki table name needs to replace theirs. In appropedia's case:
REPLACE INTO /*$wgDBprefix*/w1interwiki (iw_prefix,iw_url,iw_local) VALUES

Revision as of 08:38, 29 January 2009

Automatic Updating of the Interwiki Table

The InterWiki Map charts interwiki link prefixes. All the prefixes are accessed by mediawiki via the interwiki table in the website's database. The script outlined in this page merely increases the accessibility to maintain a set of site wide interwiki links. It provides a means to edit the interwiki table using standard wiki markup without having to bother with or have any knowledge of mySQL.

The Script

The script is organized in three parts:

  • The bash script, "UpdateInterwiki.sh": This script parses the most current collection of interwiki links posted at Interwiki map, reformats the content to suit mySQL and save the result to a text file, then calls a seperate mySQL script to update the database.
  • The mySQL script, "TableUpload.sql": Refreshes the interwiki table with any new updates.
  • The command line in the cron file that excecutes the at whatever predetermined time.


I have chosen organize the scripts in their own folder in the maintenance directory.

 ~/appropedia.org/maintenance/Interwiki/

Bash Script

~/appropedia.org/maintenance/Interwiki/UpdateInterwiki.sh

#!/bin/bash
w3m -dump -cols 160 http://appropedia.org/Interwiki_map | gawk '/Please do not change/,/The wikis below are not part of the interwiki map/ { print }' | gawk '/http:/ { print $1 "\t" $2 }' >~/appropedia.org/maintenance/interwiki/interwikitable.txt
mysql -u***** -p***** -h mysql.appropedia.org appropedia <~/appropedia.org/maintenance/interwiki/TableUpload.sql >tableupload.log;

Line-by-line breakdown:

  1. designates a bash script
  2. Calls w3m, a text based web browser, to read the Interwiki map web page and pipes the content to gawk, a text stream editor. This first gawk command isolates the web page text we are interested in, between "Please do not change" and "The wikis below are not part of the interwiki map", and pipes that content to a second gawk command. The second gawk command takes every line that contains "http:" and prints the prefix, followed its corresponding address, seperated/delimited by a tab space. The result is a properly formatted collection of table data ready to be called into interwiki table, which is saved to a file titled, "interwikitable.txt".
  3. Temporarily opens our database in the mysql shell and executes the "TableUpload.sql" mysql script. Then writes errors, if any, to the "tableupload.log" file.

mySQL Subscript

~/appropedia.org/maintenance/interwiki/TableUpload.sql

DELETE FROM w1interwiki;
LOAD DATA LOCAL INFILE '/home/lonny1/appropedia.org/maintenance/Interwiki/interwikitable.txt' INTO TABLE w1interwiki;

Line-by-line breakdown:

  1. Clears the interwiki table.
  2. Loads the properly formatted contents of the "interwikitable.txt" file into the interwiki table.

crontab

~/appropedia.org/maintenance/interwiki/UpdateInterwiki.sh

Set the execution frequency to whatever seems appropriate. We have it run once daily.

Disclaimer

This script is kinda cute. But really silly for the following reasons:

  1. It's ridiculous that I am using a web browser to pull content from our own database. This script would way more slick if it didn't take such a circuitous scenic drive to the the data. I just don't know a lick of python. So, I don't have a clue about the specific methods wikimedia uses to pull content from the database directly in order to build the site's web pages. Not that python would be needed in this script -- it would just help me figure out where specifically the Interwiki_Map page goes in the database to get its content.
  2. It's ridiculous that this script is run every single day when months usually go by between changes to the Interwiki Map. The redundancy bugs me -- not that it really matters much. But there must be some page specific hook script or something that would trigger this update script, only at times when the Interwiki Map page has been edited.


It's crude; but it's simple and it works.

If anyone out there improves on this script for their own site, please show me what you changed. Or, if you simply know what I could do instead, and are able to communicate it, let me know. I would greatly satisfy me to make this better.


Table Format Conversions

All commands are written for the unix environment

White Space to Tab Delimited

Population of the Interwiki table requres specific formatting, called out in the mysql command, LOAD DATA INTO TABLE. The script used for appropedia.org uses mysql's default format, tab delimits per line of entry.

gawk '/ http:/ { print $1 "\t" $2 }' file.in >file.out

This command line requres that each Interwiki table entry is on its own line and that the prefix is seperated by its address by some amount of white space. Also, that the prefix is the first entry and the address is the second. This format is how Meatball Wiki provides their InterMap textfile.

White Space to Wiki Code

If you need to do the reverse, copy the contents from the interwiki table (tab or white space delimited) and convert it to a table in wiki code, use this:

gawk 'BEGIN { print "\{| class\=\"plainlinks\"" } ; { print "| " $1 " || " $2 "\n|-" } ; END { print "|\}" }' file.in >file.out

This command requres input similar to what is produced by the command 'White Space to Tab Delimited' above. Similar meaning, tabs and/or multiple spaces (any white space).


Included by Mediawiki

FYI: Mediawiki installation includes a few collections of interwiki prefixes in the maintenance folder.

The files Mediawiki includes are:

interwiki.sql
wikipedia-interwiki.sql
wiktionary-interwiki.sql

If you're content with the content of any of these collections and you have no need to regularly maintain your own collection, it's fairly simple to call them into your interwiki table. They need to be slightly edited first.

Editing Included Interwiki files

Everything above the line

REPLACE INTO /*$wgDBprefix*/interwiki (iw_prefix,iw_url,iw_local) VALUES

needs to be deleted. And your own interwiki table name needs to replace theirs. In appropedia's case:

REPLACE INTO /*$wgDBprefix*/w1interwiki (iw_prefix,iw_url,iw_local) VALUES
Cookies help us deliver our services. By using our services, you agree to our use of cookies.