(→‎Images: good info from old PDF porting page)
(→‎Methods: arrange further)
Line 4: Line 4:
==Methods==
==Methods==


===1. Save as formatted text===
Convert to as a Word, RTF or HTML file:
* In the non-free Adobe Acrobat there is an option to save to Word format[http://www.adobe.com/products/acrobatpro/acrobatstd.html] - but apparently not in the free Adobe Reader. (If you have access to the larger non-free program Adobe Acrobat - v 5.0 or higher should work[http://www.library.mcgill.ca/edrs/services/publications/howto/PDFtoXLS/PDFtoExcel.html] - please try this and let us know here if it works. Your workplace, university or school may be able to give you access to it. In the free program, (in Linux and Windows version) it seems to only offer plain text. <!--If you open this in OpenOffice v 2.3 or greater, you should be able to export it as MediaWiki format. (Does this work smoothly?)-->
* Acrobat Reader has an "export as text" function, but only plain text. Copying and pasting also only gives plain text
* Acrobat Reader has an "export as text" function, but only plain text. Copying and pasting also only gives plain text
** Test other readers to see if any do formatting. The following only do plain text: Evince Document Viewer 2.00 for Linux...)
** Test other readers to see if any do formatting. The following only do plain text: Evince Document Viewer 2.00 for Linux...)
* In the non-free Adobe Acrobat there is an option to save to Word format[http://www.adobe.com/products/acrobatpro/acrobatstd.html] - but apparently not in the free Adobe Reader. (If you have access to the larger non-free program Adobe Acrobat - v 5.0 or higher should work[http://www.library.mcgill.ca/edrs/services/publications/howto/PDFtoXLS/PDFtoExcel.html] - please try this and let us know here if it works. Your workplace, university or school may be able to give you access to it. In the free program, (in Linux and Windows version) it seems to only offer plain text. <!--If you open this in OpenOffice v 2.3 or greater, you should be able to export it as MediaWiki format. (Does this work smoothly?)-->
* Use [[wikEd]] - this doesn't work yet, as the formatting is not saved when pasting into the edit box. Are there PDF readers or editors (or any other program which can open these files) which allow the formatting to be copied and pasted?


====Freeware & free online services====
====Freeware & free online services====
Line 30: Line 31:
* http://www.convert-in.com/pdfekit.htm $29 just for [http://www.convert-in.com/pdf2word.htm PDF-to-HTML]. Demo available (functional?)
* http://www.convert-in.com/pdfekit.htm $29 just for [http://www.convert-in.com/pdf2word.htm PDF-to-HTML]. Demo available (functional?)


===OCR===
====OCR====


When a PDF file (or other format) is image based rather than text-based, this may be helpful. See [[User talk:LeissKG]] for a  discussion of this technique.
When a PDF file (or other format) is image based rather than text-based, this may be helpful. See [[User talk:LeissKG]] for a  discussion of this technique.


If the other techniques above are successful, then OCR should probably not be used, as it will inevitably introduce some errors. It seems likely to be more difficult as well.{{fact}}  
If the other techniques above are successful, then OCR should probably not be used, as it will inevitably introduce some errors. It seems likely to be more difficult as well.{{fact}}  
=== 2. Convert from formatted text to MediaWiki ==
* Use [[wikEd]], or
* Open the Word or RTF document in OpenOffice (version 2.3 or higher), then under the File menu{{fact}} choose ''Export'', then choose MediaWiki format.


=== Manual formatting - old method ===
=== Manual formatting - old method ===

Revision as of 13:46, 29 January 2008

This is still a work in process - you can help by trying these methods and adding any information about what works. Contact Chriswaterguy or Curt if you have questions.


Methods

1. Save as formatted text

Convert to as a Word, RTF or HTML file:

  • In the non-free Adobe Acrobat there is an option to save to Word format[1] - but apparently not in the free Adobe Reader. (If you have access to the larger non-free program Adobe Acrobat - v 5.0 or higher should work[2] - please try this and let us know here if it works. Your workplace, university or school may be able to give you access to it. In the free program, (in Linux and Windows version) it seems to only offer plain text.
  • Acrobat Reader has an "export as text" function, but only plain text. Copying and pasting also only gives plain text
    • Test other readers to see if any do formatting. The following only do plain text: Evince Document Viewer 2.00 for Linux...)

Freeware & free online services

Check these (and do a search to make sure you've got the latest version):

  • Zamzar - upload the file and receive an email with a link to the output file. Works well, some hassle and hiccups. Formatting may need extra work, e.g. double line-breaks need replacing with single line-breaks for best results. This is the only solution known to work so far.
  • Sorax PDF SDK DLL Edition 1.1 - "export PDF files to... XML." (image or text?)
  • Adobe's online conversion service - tends to be slow - if it works at all.
  • Free PDF to Word Doc Converter - reviews and comments[3] suggest that this is "nagware" (i.e. freeware hassles you, adds extra steps) and that Zamzar (above) gives better results.

Commercial programs (apart from Adobe Acrobat)

Question: are there free trial versions that do what we need? Help by trying them out. (These programs are not guaranteed - do some Googling to make sure they're safe, and/or make sure you've got good anti-spyware and anti-virus.)

These are not ideal, as 1. we can't invite everybody to help out without paying lots of money or stretching/breaking the licensing agreements, 2. they usually take an extra step, via Word, and 3. They're only for Windows.

But for reference (in case of desperation):

OCR

When a PDF file (or other format) is image based rather than text-based, this may be helpful. See User talk:LeissKG for a discussion of this technique.

If the other techniques above are successful, then OCR should probably not be used, as it will inevitably introduce some errors. It seems likely to be more difficult as well.[verification needed]

= 2. Convert from formatted text to MediaWiki

  • Use wikEd, or
  • Open the Word or RTF document in OpenOffice (version 2.3 or higher), then under the File menu[verification needed] choose Export, then choose MediaWiki format.


Manual formatting - old method

This is not recommended, but if you have problems with the other methods and need to try it, see Help:Porting PDF files to MediaWiki (old method, manual formatting).

Images

Images must be saved and uploaded.

  • Until now, this has been done as described at Help:Porting PDF files to MediaWiki (old method, manual formatting) #Transfer the images. There may be easier ways now, but there are still useful info and tips there, e.g. don't try too hard to match the layout of the original... PDF's are fixed size, while the layout of the wiki article will flex based on several variables. So invest some energy in layout, but don't overdo it.
  • In PDF-to-HTML conversion the images will be output in the same folder. (However, with Zamzar, each page's images are turned into a single image taking up the whole page - the text fits around it.)
  • In PDF-to-Word conversion the images will be integrated in the document.
  • Acrobat: Images are apparently saved automatically during file export:

Question: Which of the formats include tags to indicate image location?

Cookies help us deliver our services. By using our services, you agree to our use of cookies.