Sundered Peak

through the mind of kyle tolle

The Downfall Of Formats

During 2015, I wanted to spend more time at home, and cultivate creativity there. Because I’ve enjoyed writing since I was young, I aimed to write more consistently. The frequency with which I’ve published blog posts this year is a signal of my progress. I’m proud of this, and hope to continue this trend next year as well.

My second goal has been to preserve my older writings. What does that even mean? For physical writings that exist on paper, it means digitizing them into a format that has the potential to survive technological upheavals. For writings which are already digital, it means rescuing them from the graveyards in which they’re currently trapped.

On this front, I’ve started preserving my early poetry, and typing up hand-written pieces. Going through this process has brought to mind why this is important to me. Bear with me; this is a weaving exploration of these goals and other topics. I won’t apologize, however. Life is not straightforward, so neither can all stories be.

The Early Software

I first wrote on computers with Microsoft’s operating system, software, and file formats. The first piece I remember creating for school was an article on space. I wrote it in Microsoft Works, and the format is .wps. I spent most of my time, however, using Word – particularly for my early poetry – and that format is .doc.

Word worked really well. I was able to focus on the content and even style the documents easily. It had a lot more features than Works, which was another draw. My school’s computers had Word, which meant I could use a single program in multiple places. That was convenient.

I played around with Corel WordPerfect, but it didn’t catch on. Using another text editor like Notepad never occurred to me. Notepad had zero features compared to Word. Even line wrapping was a pain in the ass.

At home, we had a single computer in the house. The entire family would get time on it, although I’m sure I took up the lion’s share. After PDAs came out, I wanted one badly. Having one with an external keyboard would allow me to write more often and easily. I wouldn’t have to share time on it, nor would I have to type up things originally written by hand. I pictured myself using it all the time. Ahh, to dream. I never did get a PDA, and, as you might have guessed, my life has been a complete train wreck ever since.

Occasionally, I backed my documents up on floppy disks. To back up a piece I wrote about the Titanic, I had to split it onto multiple diskettes because it was too large to fit on a single one. But don’t let me fool you, the images I added to the file were the main culprit. It’d take a large amount of text to fill a floppy disk, and I’m not prolific enough for that.

Enter the Web

I recently posted the story of how I became a web developer, which recounts how my love for writing lead me to my profession. Along with creating my own websites, I looked for other sites I could also use for writing.

Writing.com was a big one. It was a great place to find new things to read, and receive feedback from others. Eventually, I got banned for trying to exploit a loophole on the site. That hurt. A lot. It wiped out all the reputation I’d built up on the site, and I couldn’t connect with people I’d met there. That site later went to crap. I’m not saying that happened because I got banned, but that’s kind of what I’m saying.

I learned of Xanga and used that a decent amount. Other friends like Zach and Fergy had accounts too, and we’d post there with all our latest happenings. I also played around with phpBB as an online bulletin board for chatting with friends. From college onward, I have used other programs to compose and store my writings like OneNote, WordPress, Google Wave, and, most recently, Evernote.

Left Behind

All in all, I’ve used a decent number of applications. My experiences with them over the years have been enlightening. I’ve experimented with different software and workflows, which has helped me find what works for me. It’s also brought to light some problems.

Remember the space article I wrote in a .wps file? Sadly, my latest copy of Word can’t even open it. Microsoft has already abandoned one of its formats. An open source alternative called LibreOffice can open the file, so I’m still able to access it. For now.

The majority of my early writings were .doc files, and they’re still accessible today. Oddly, some of the files don’t render properly in newer versions of Word, LibreOffice, or Google Docs. This makes it a bit challenging to read through them today.

I hated Apple even more than I hated Microsoft, so I would have never expected this, but I’ve switched operating systems for my daily computing. Software for web development sucks a lot less on a Unix-like platform like OS X, even though it’s not perfect. Switching OSs has lead to my decreased dependence on Microsoft’s software. Now, I don’t have a copy of Word on the computer I use most often.

What about those early websites I created or used? Realm of Aaxiler is dead. Scrawlpoint still exists, but has been defunct for ages. It limps along until I decide to eventually pull the plug. This blows my mind: even my own sites didn’t survive the test of time. I lost everything on Writing.com. At some point, Xanga wiped out a lot of content. Fortunately, I was able to export my stuff before that happened. Not a good track record with these sites, right?

And the more recent pieces of software? Google Wave was technically interesting, but died out pretty quickly. Google shuttered it years ago now, taking all the content with it. I had OneNote installed on my Windows machine, but that didn’t seem a good option after switching operating systems.

I still use Evernote and WordPress. Evernote is for quickly capturing ideas and notes. I publish my finished pieces to WordPress. Another system I’ve thrown into the mix this year is everything, which is where I flesh out and edit my writings before publishing them.

The Hard Lesson

My main takeaway from using all this software is that letting someone else control or host my content is a poor policy for making it accessible later. Distilling this a bit more:

Software evolves at breakneck speeds and all else is forgotten.

My second takeaway was the slow realization that my creative works are split across many pieces of software and data formats. Each of the applications and formats is a liability. This concerns me, and I don’t like the risk it brings.

One might argue that the content isn’t valuable if I haven’t touched it in so long. And if it were important, wouldn’t I keep it up to date with new formats and new software? Except, this has no analog on paper. I can write on whatever paper I want, and still be able to read it in 50 years. I don’t have to transcribe it every decade or two so that it remains accessible.

Sure, I don’t need to open writings I created 15 years ago, but I like the idea of it. It’s nice to come across pieces later in life, and feel nostalgia for your past. I want a breadcrumb trail of my works to look back on and see what things were like then. There’s also a chance for serendipitous moments when I reconnect with ideas from earlier days.

These lessons are my main motivators to preserve my older writings. I know, I know… I’m a geek. Good thing I’m an adult, and can spend my free time however I like.

The Right Medium

Earlier, I mentioned that my writings are trapped in graveyards, and the preservation I do now should be aimed at surviving technological upheavals. How can I accomplish this?

Choosing the right medium is paramount. What program, service, or format do I use? This is tricky. I want to access the writings today, as well as at some point in the unknown and distant future. Software has the pernicious tendency to rot, but file formats rot as well.

Any format might seem like a good idea, at least initially. Only later would that choice prove dangerous, when it’s no longer possible to open a file. Programs must support reading, displaying, and writing the format. When the format loses enough popularity, software updates will remove support for it. Even the most prevalent formats will fall out of style and be replaced by newer ones.

Software is generally a form of vendor lock-in. Companies invent their own formats, APIs, and syncing mechanisms. Along with being proprietary, they’re often not interoperable. You place your data in their vault, and there it shall forever stay. Graveyards growing larger by the day.

Each program also has limitations and a lifespan. It won’t be around forever, even if it’s hard to imagine now what would take its place. Ten years ago, I would have expected to still use Windows, and to have Word installed on the computer I use most. That’s not the case, though.

Evaluating Formats

I’ve used various formats, and have writings stored in each of them. My goal is to choose a single one for long term use. This way, I could confidently preserve my existing digital works, and have a solid foundation for backing up other written pieces.

Realistically, choosing just one format itself is risky. I should build redundancy into my process. If my content is stored in multiple formats, it has a better chance of surviving. Picking one format as the source, and then another format or two as fallback would be smart.

To choose, I need to evaluate the options and see what software and formats are suited to this archival purpose. I guess I need some ground rules too. These will allow me to compare formats, and find what is most sensible for my needs.

The format must be:

There’ll be personal preference that gets mixed in here too.

How About Services?

I’m hesitant to heavily invest in new services but if something has the right qualities, I’d consider it. If I consider a service, there are some additional rules.

The third bullet point is most important. I won’t put all my eggs into one basket. All services have uncertainty about how long they’ll be around. That’s when vendor lock-in really comes to bite me in the ass. If I rely on one too heavily, and they shut down, my workflow could be impossible to continue. This is an enormous hazard over the long term.

Let’s Go!

Now we’re onto examining the options and deciding whether they’ll work. If not, I’ll give reasons as to why they get ruled out.

What About Word?

Microsoft doesn’t support the .wps format any more. Eventually, I’m sure the open source software that supports it will cease to. This format is ruled out immediately.

Many of the files I’m concerned with preserving are .doc. Does it make sense use it as the format for the rest of my writings?

The .doc format is binary, which means I need special software to able to work with those documents. It’s not plain text, so I cannot edit it with a basic text editor. The format is also largely proprietary, and Microsoft controls its destiny. Microsoft has made the specification available since 2008, but they still don’t describe all its features. People have reverse-engineered the format, to create other software that works with it, but this is not a format that meets my criteria above.

Without software which knows how to read .doc, my content is inaccessible. Microsoft has created other formats to take its place. In another 10-15 years, it might not be possible to open these files. Or, if it is possible, it could be quite the hassle. Similar to what’s happened with .wps.

So what about something like the .docx format? It’s newer, and has been made into a standard. The latest versions of Word default to this format. Quite a few programs support it, and it’d likely be straight-forward to upgrade my .doc files to .docx. Since it’s backed by XML, the format is, in a way, plain text. But that XML is then zipped, which makes it binary. The advantage is that, even without Word, I ought to be able to extract the content.

I’m not excited by the prospect of tying myself to an XML-based format, even if it is plain text. XML isn’t easy to write in, without software that hides the XML from you, like Word. And using Word would makes me reliant on a proprietary program bound to a single company.

As I mentioned, I don’t use Word much these days. Why bother trying to get myself back into using it? It costs decent money to use and upgrade. It also feels awfully heavy-weight for the writing I do. Sure, there are open-source alternatives that support .docx, but I’ll pass. There are other options left to explore.

What About PDF?

When Adobe’s document-presentation format first came out, there wasn’t an easy or free way to create a .pdf file for my poetry. It was also a proprietary, closed format. Adobe made it an open standard in 2008. Since that time, there’s been an explosion in the amount of software that works with PDFs. For example, it’s easy today to print a PDF from a web page.

Since it is an open standard with a lot of support, .pdf looks like a good option. Except PDF is a weird format. It’s easy to create, but hard to edit. It is well-suited to viewing works, like a hard copy would be. But how do you easily edit a PDF document? That’s a rabbit hole not even I want to peek in.

Since it’s easy to create PDFs from other formats, I will consider it as one of my fallback options. I can use it to back up the final product, since it’s good at preserving the presentation. But I do not want to author works with it, or use it as my system of record.

What About Evernote?

An obvious candidate for authoring and backing up my writings is Evernote. I’ve used it constantly over the last several years. I even pay for the Premium version. It’s particularly adept at taking notes on multiple devices. It has plenty of space, as well as note history. It’s software that’s made for this stuff.

But Evernote uses proprietary software and formats to store the data. My experiences with it leave me wary of depending on it heavily. Some are already calling Evernote the first dead unicorn. I don’t know if that’s true, but I’m sure Evernote as a company has a limited lifespan. The service is the software, and it won’t be possible to use the Evernote software once the company shuts down. That rules it out as a format or writing preserve.

What About WordPress?

WordPress is another application I’ve used for a long time. It’s great at hosting content. The content is mine, and I can export it easily.

WordPress does, however, store that data in a database. The database makes it easy to dynamically build a site, but it’s not a plain text format I can open up with any text editor. Then, the nature of it being a web application means I need a browser and Internet connection to use it. Those are dependencies I don’t really want.

If hackers gained access to my site, I could lose everything. Fortunately, WordPress has plugins to automate data backups to other services like Dropbox. This helps mitigate some of that risk, but it’s just not designed for the kind of use I want.

I plan to continue hosting my blog on WordPress, especially since I’ve automated publishing my finished pieces. But I still have to find something else for preserving and backing up my writing collection.

What’s Left?

I’ve covered what formats and programs and services I won’t use, for various reasons. But what’s left? What can I use? I suppose I should tell you that this post hasn’t been me walking myself toward a decision. It’s been me verbalizing decisions I’ve already made. Trying to understand myself a bit more, and provide insight to anyone else interested. I’ve already settled on a format to use, and have been successfully using it for a majority of this year.

My format is plain, ol’ text files. For styling, I use the Markdown syntax. To indicate this, the files have a .md extension. Markdown draws inspiration from how plain text emails have been formatted over the years. There are conventions on how to indicate things like emphasis, links, quotes, code blocks, and headers. Best of all, I only use regular characters to mark up my content, and these characters are easy to type on any keyboard. This makes it incredibly easy to start and continue using.

Markdown is also a program which converts this plain text syntax to HTML. I’m currently writing this post in Markdown. I’ll later run it through some code that converts it to the HTML you are now reading on this blog.

How long has Markdown been around? Since 2004. That’s not very long, in the grand scheme of things. But it has a lot of pros. It is plain text, well-supported, open, very easy to write in, platform agnostic, easy to back up, easy to version control, and I can create these files on my computer, without needing an Internet connection. Markdown also has a boatload of support on the web.

One downside may be that there are quite a few different implementations of the HTML conversion. The original syntax is ambiguous in certain cases, so programs differ in subtle ways. There are other things the original Markdown syntax doesn’t cover, and extensions of the syntax have popped up, which help to make it more powerful and comprehensive. These extensions might conflict with one another, though.

The awesome thing about Markdown is that even if the software to convert my documents to HTML disappeared, the text would still be usable. Any text editor can open these plain text files. And the syntax itself is a good indication of what the desired styling is. This makes me confident that Markdown can stand the test of time.

Since Markdown (and its various flavors) have a lot of traction, there exist programs to generate HTML pages and PDF documents from Markdown-styled text. That’s a big win. Creating those fallback documents containing the content and style is suddenly much easier.

Another huge benefit is that I’m not tied in to any proprietary pieces of software, or services which could shut down, and the syntax is free to use. I don’t need any special software to use it.

To write my pieces, I’m quite fond of using the free and open source vim. I’ve used it for years to write code, and it feels natural to extend it to my prose as well.

What About Plain HTML and CSS?

Markdown converts to HTML, and that page can then be styled using CSS. Why not cut out the Markdown middleman and go straight to HTML and CSS? They’re also open formats, plain text, platform agnostic, and all the rest.

HTML and CSS are even more widely supported than Markdown. Think of how many billions of web pages use HTML and CSS. They are technologies that will have a long tail. You can even view the very first web page ever created, and it renders perfectly in modern browsers. That’s incredible, considering it was first created in 1991.

I could do this, certainly. But typing the HTML while I write would feel clunky. I’m not trying to write a web site as I flesh out a blog post. I’m trying to work on the words themselves. Markdown seems like a nice compromise. I can work on the content, easily add some markup as I go along, and then convert it to HTML when I’m ready to publish. At that point, I can consider other details of styling, if the document requires that attention.

The HTML output from Markdown is itself another fantastic fallback document. It essentially comes along for free, thanks to the qualities of Markdown.

The Backup Strategy

Additionally, I need a process for backing up the writings. I want them safe, even if it’s years before I access them again. A successful backup strategy would allow me to lose one copy of a writing, and still be able to recover that document. This way, an accident, user error, or hardware failure wouldn’t mean a piece disappears forever.

In reality, the best backup strategy is actually multiple, redundant, backup strategies. Having many copies in many locations is even better for covering my ass.

My hesitation about services dissipates when I don’t rely on them too heavily; when a service isn’t a single point of failure. If I use a service as a backup destination, I feel better about it, because it’s one of several.

I love services like CrashPlan and Dropbox, because they’re transparent. I can use both at the same time, seamlessly. They back up files on my hard drive, which is about as unobtrusive as you can get. Backup strategies are fantastic when you can layer or combine them. For instance, it’s easy to throw in another option like Apple’s Time Machine, without changing anything about the others. They just work like normal.

I could even consider Evernote as a backup site. Evernote documents are HTML, so I should be able to take the HTML output from my Markdown piece, and stick it into a note. I’ll have to look into automating this process in the future.

Karla suggests I should print out hard copies and store them in a safe deposit box. Storing backups away from the home is a good way to keep a natural disaster, burglary, or fire from wiping out my data, but I wonder what the costs and effort associated with that are. Like she’s said, it’s not as elegant as the other options, but it is certainly interesting as a last resort. It could be the fallback to all my other fallbacks.

Version Control

For version control, I use git. It’s been around for 10 years, and has strong support around the globe. I’m familiar with git because I use it to version control software, and it turns out to be a great option for putting plain text writings under version control too.

This allows me to keep a history of changes I make to pieces. And git gives me the ability to push my writing repository to multiple places, like BitBucket or GitHub. Each of these places has a full copy of the history, which means they can serve as backups too. This is yet another way to back up my content, along with keeping track of my changes.

Conclusion

I’m using plain text, Markdown files to store my writings. I write using vim. I can convert them to HTML and PDF pretty easily, as fallback formats. I store the Markdown writings in a git repository, and these all get backed up to BitBucket, Time Machine, Dropbox, and CrashPlan. This feels like a robust way to back up my digital writings. It’s something that also ought to scale as I back up more writings in the future.

It’s been helpful to think through these decisions, and I hope this might help someone else. I’d love to hear your thoughts, if you’ve considered these topics at all. I’m sure there are things I haven’t considered, or things I’ll only learn down the road.

I’ve got other goals for writing-focused, web-based software, which I hope to explore. Writing this year helps give me more experience to use when I undertake those software projects in the future.

Thanks for reading, and here’s to writing!