Background

I was curious whether it would in theory be possible to use the docx format as a storage format for markdown documents with their associated image files, and in addition, to support the light editing of the docx file directly.

In other words, the primary authoring modality would be via markdown and attachments, because this works really well for software documentation, but storage and sharing would happen via docx files, additinonally supporting the occasional direct editing of that docx without breaking the normal markdown workflow.

Results

I created a small experiment and put it up on github as pandoc-md-docx-roundtrip, where I converted a README.md with source code blocks, LaTeX math and an image to docx, and then converted that back to markdown with pandoc and two Lua filters to help with the codeblock language label maintenance.

On the github, you can find the input README.md and the roundtripped output README-from-docx.md.

Except for the extra metadata around the image, I’m pretty happy with how close the roundtripped markdown is to the original input.

Interestingly, the image that’s extracted from the docx is identical to the original as confirmed by md5sum, meaning in my case that it can even be re-opened and amended with excalidraw.

Screenshots

Below you can see screenshots of the input markdown in VSCode and the docx conversion in Word online, because everyone loves screenshots.

Editing original markdown in VSCode

Editing original markdown in VSCode

Output DOCX in Word online

Output DOCX in Word online