WordPress Book Posting Scripts

Module 1: doc_to_html.py (.docx to .html)

The first thing I had to do was find a way to convert my document to html. I did so with the following script:

import win32com.client as win32
import os

def convert_to_html(doc_path):
    """
    This uses microsoft word and win32com to open a specified document and then
    save it out to html.
    """
    word = win32.Dispatch("Word.Application")
    word.Visible = 0

    word.Documents.Open(doc_path)
    doc = word.ActiveDocument

    save_as_file = os.path.join(os.getcwd(), r'htmls\temp.html')
    print "Saved"
    doc.SaveAs(FileName=save_as_file, FileFormat=10)
    doc.Close()
    print "Closed"
    return save_as_file

if __name__ == '__main__':
    doc_folder = r"C:\PathToTheFolderWithYourBook"
    doc = "NameOfYourBookFile.docx"
    doc_path = os.path.join(doc_folder, doc)
    return_path = convert_to_html(doc_path)

I developed this script with reference to stackoverflow answers on using win32com to automate Microsoft word that I’d Googled.

The variable FileFormat in line 17 determines what the output file format will be, not the extension for the new file name in line 15. If you want to adjust the script to convert word documents to other types of files, you can find out what to change the FileFormat to by referencing this list of word save format id numbers.

Utility Scripts , ,

Leave a Reply

Your email address will not be published. Required fields are marked *