Appropedia:Porting/Websites

To help:

Pick a blog (from the section #To be ported or any suitable open-licensed source).
Follow the steps at #Conversion

To be ported

These are useful blogs and simple HTML sites that can be easily converted to MediaWiki (see sections below).

Note: Only sites that are CC-BY, CC-BY-SA or public domain may be listed here.

Priorities

Start with these quick, easy pages, i.e. basic HTML. These are easily converted. This list is transcluded from Appropedia:Porting/Websites - see that page for porting instructions and other info.

http://web.archive.org/web/20140626135900/http://www.livinggreener.gov.au/ - CC-by notice

Tech development

If there's a way to download & turn whole sites into MediaWiki markup in one go, that would be even better - i.e. scr*ping as the first step. (That word triggers certain filters, hence the censorship.) Once they're in MediaWiki format, they can then be sifted and moved manually to suitable pages. - It's probably just as easy (and safer) to do it page by page, as we have an easy conversion tool in the Wikedbox.

Blogs - start here

http://michaelkeizer.com/humourless/ (Started Steven M.) A Humourless Lot staging area
Other open content (CC-BY or CC-BY-SA or public domain) posts at blogs relevant to Appropedia
http://www.theoildrum.com - mostly very relevant. Big job.
- Minor point to be aware of: Subsite structure could be a bit confusing. Choose one subdomain at a time, and go from most recent to oldest. There are links to the sub-sites across the top... (The Australia-New Zealand blog has its own anz subdomain, but also appears in the www site, e.g. http://www.theoildrum.com/node/3464 - should become clear as we work on it.)
http://shikigami.net/en/ - now defunct food forest in Japan. content released as anticopyright

Non-blog sites

These are not in a simple chronological order, so they will need a plan for copying the pages.

Paste directly into pages with descriptive names - based on name of original page, but according to Appropedia naming conventions, i.e. lower case except for proper names. Modify page name when needed to avoid clashes or ambiguity.
For pages that are a series, add a table of contents, like {{TTH chapter links}}, at the top of each page. Make a list of all pages as you create them, in order, for this table of contents.
Add attrib notice at bottom.

Specifics:

Family Planning: A Global Handbook for Providers - looks like it's public domain (partly US fed govt funded? no copyright notice?) but check first. (website down --Ethan (talk) 01:12, 1 November 2015 (PST))
Strong Towns, urban planning, economic growth and related issues.
Demotech (site). First we must create an attribution template, noting that the original author would appreciate feedback, maybe linking to a forum thread. Then start with publications.
- There is a wiki section of the site as well - leave this for now, as it might be handled better in a different way (I'm looking at PmWiki to MediaWiki conversion: reverse this method? & I've emailed this person. --Chriswaterguy 12:18, 4 March 2010 (UTC))[reply]
http://www.slowsandfilter.org - started at Slow Sand Filter staging area (Phil working on this)
http://www.shared-source-initiative.com/biosand_filter/biosand.html - started at Shared Source Initiative biosand staging area (Phil working on this)
Encyclopedia of Earth - CC-BY-SA.
Livablestreets - great pages (I've pinged them re collaboration, waiting for response ~ 26 Oct 2009. We can port their content, but think first - are we porting all, or a selection? --Chriswaterguy 05:04, 26 October 2009 (UTC)). Update They're shutting down their site, and moving the wiki elsewhere, possibly in April. I've asked for details.[reply]

Check these

These might have good relevant content, or which have good relevant content that you need to dig for. Note that just because it's a great site, doesn't mean it's suitable for porting to Appropedia. Scan through quickly and pick what's suitable. (Upcoming events may be good; past events only interesting if they have interesting info, e.g. about practices, designs, or networks.)

Asia Pacific Civil-Military Centre of Excellence - Publications (Aust govt think tank - Chriswaterguy knows their social media person) - all new publications should be under CC-BY, but double-check.
http://builtenvironmentblog.blogspot.com/ (e.g. the critique of superblocks). Images may not be open licensed.
http://www.designinnovation.ie
NetSquared - activism and change - will probably include some suitable content. (Anything suitable on the main site, http://www.netsquared.org/ ?)
http://learningforsustainability.net/
http://teamsuperforest.org/ - many posts, occasionally one suitable for the wiki. Scroll through as far as October 27, 2009 Impression Sessions: New Zealand - these have been done, so skip to October 18, 2009 (Millennium Seed Bank Hits 10% Target: These Seeds are Bananas! on page 4 or later) - Millennium Seed Bank is good, then look for later posts.
http://www.opportunitysustainability.com/ (haven't looked closely yet. suitable?)
Mana Mushrooms - First we must clarify the license (request clear CC license) then create a note that the original author would appreciate feedback (Mana Mushrooms#Porting pages).
http://mises.org/ - libertarian think tank; thoughtful arguments from a very different perspective than most of our sources, i.e. very valuable for balance. (Most importantly, some of the arguments are sound.) Need to search the site, as only a small selection of the topics is suitable for Appropedia. (Note the pages on Appropedia which already cite or include content from Mises Institute.
Otago Polytechnic, NZ: All work created by the university and its members are automatically available via CC BY. Look around the site to see what is of value, and double-check that this license applies for that particular content.
CD3WD - as open access. This is low priority: Note that this is already available online, we aren't really importing it to be used as wiki content (as most or all is not open licensed), and we don't have our own records of open access permissions.

Places to look for more

Public domain info mentioned on Appropedia:Porting. Finding well-organized online manuals/guides is perfect, though these are often in PDF. (The Public Domain Search will be useful, but it needs to be restarted. Ping me if I haven't done this by mid-March 2010. --Chriswaterguy 14:56, 20 February 2010 (UTC))[reply]

Already converted, now needing more processing, breaking up and editing into suitable pages. Occasionally these will need updating with new posts:

Bloodandmilk/content staging area
Afrigadget/content staging area

Conversion

Requirements: Firefox web browser (or other Mozilla browsers should also work)

Choose a blog from the list above.
Start a new page with a "staging area" in the title, e.g. "Blog name staging area"
At the top of the page, paste this "code":

{{content staging area}}

== Content ==

Above the "Content" header, add the url of the site you are doing,
You have created the target page - leave it open and we'll come back to it later.
Open Wikedbox in a separate window.
Copy and paste the formatted text from the blog into the box at Wikedbox. It should keep much of its formatting.
Continue to copy all the content from the blog - the front page and following pages. .
- If the standard blog view shows the complete posts, select the body of each page, and paste it in. Then go to the next page and do the same, pasting it below the previous content in the Wikedbox. ("Body" means everything except headers, sidebars and footers - just the blog posts themselves.)
- If the standard view only displays the beginning of each post,blog can't be displayed in this way, it will be more work to do. Open each blog post in a separate window, copy the body of the post to Wikedbox.
If it is not a blog, but is divided by topic:
- Find where the pages are all listed e.g. sitemap or navbar.
- You may wish to add a "Progress" header above "Content" esp you're not doing it all at once.
- When you copy in each page, start with four dashes ---- then the name of the page, e.g.

----

Rocket stoves

When you have a good amount of content (maybe you're afraid of losing the content if the browser crashes) click the wikify button above the edit box, which looks like: - and wait until conversion is complete (should be a few seconds, depending on page size).
Copy the converted wiki text (ctrl+a to select all, then ctrl+c) and paste into the target page, below the "Content" header.
Click "Save."

Thank you for your help! You'll end up with a long page that looks very messy - this is an important first step in porting this content to Appropedia. Don't try to fix up the broken links & formatting - leave that for the next step.

Basic fixing up

A bot (probably ChriswaterguyBot) will then add templates to each post, convert tags to categories, turn some of the broken image links into external links, and make other repetitive fixes.

For reference, these are some of the commands used on one of the blogs:

python replace.py -regex "\[(\S*).*\[ IMAGE_LINK_HERE.*\d*px\|([^\]]*)]*" "[\\1 Image: \\2]" -page:"Afrigadget/content staging area" -summary:"fix image links (make into proper external links)"

python replace.py -regex "\[(\S*).*\[ IMAGE_LINK_HERE.*\d*px\|(.*)]" "" -page:"Afrigadget/content staging area" -summary:"making break between posts"

python replace.py -regex "(Tags: ]|, ]|SHARETHIS.addEntry\({.*\n| \| \[\S*#(comments|respond)\s*\d* .*omment.*])" "" -page:"Afrigadget/content staging area" -summary:"removing misc bits of code from converted HTML"
<pre>python replace.py -regex "(Filed in:[\n\s]*|, )\[http://www.afrigadget.com/category/\S* ([^\]]*)]" "\n" -page:"Afrigadget/content staging area" -summary:"Change blog categories to wiki categories"

 python replace.py -regex "\[(\S*).*Posted: [^\[]*(\[.*])" "{{attrib afrigadget|url=\\1|author=\\2}}" -page:"Afrigadget/content staging area" -summary:"replacing header with 'attrib afrigadget' template"

 python replace.py -regex "(\[http://www.afrigadget.com/tag/)(\S* )([\w\-\s]*])" "]\n[[Category:\\3]" -page:"Afrigadget/content staging area" -summary:"changing tags to categories"

sand pages:

python replace.py -regex "(<[\]*div>)*(\t)*" "" -page:"Slow Sand Filter staging area" -page:"Shared Source Initiative biosand staging area "
python replace.py -regex "{\|.*border=\"0\"" -page:"Slow Sand Filter staging area" -page:"Shared Source Initiative biosand staging area "
python replace.py -regex "\| colspan=\"5\" \|" -page:"Slow Sand Filter staging area" -page:"Shared Source Initiative biosand staging area "
python replace.py -regex "\| (
)*(
)* )*" "" -page:"Slow Sand Filter staging area" -page:"Shared Source Initiative biosand staging area "
python replace.py -regex "<br([ ]*[/]*)>" -"\n\n" -page:"Slow Sand Filter staging area" -page:"Shared Source Initiative biosand staging area "
python replace.py -regex "\r\n\r\n-\r\n\r\n" "\n\n" -page:"Slow Sand Filter staging area" -page:"Shared Source Initiative biosand staging area "

\| bgcolor=\"#......="[ ]*(rowspan=\"2\")*[ ]*\|

For FEMA pages:

remove (<[\]*div>)*(\t)*
change ''' with == for headings

If you're also interested in running a bot, that would be really appreciated. Contact me. --Chriswaterguy 08:51, 19 February 2010 (UTC)[reply]

Preview and fix problems as you are able to (or save first then do the fixing).
Note that it works for formatted text, and not for images. The urls of images are replaced with their filenames, so you must decide what to do:
- Upload the images & ensure the links are correct - only if the images are open licensed; or
- Manually fix the image links to correct links to the images -where they are very relevant, but not open licensed; or
- Strip out the links entirely (easiest option)

WikEd instructions are found at Wikipedia:User:Cacycle/wikEd.

Making articles

After all that is done and the green light is given, it's time to convert to articles. Remove the text from the "staging area" and

Where there are

a lot of borderline cases, it's best done by or with someone who is familiar with Appropedia, who can make judgements about which content is suitable.