Machine Learning: A Precursor to future SMART Manuscript Styling

James Macfarlane

October 12, 2020




The global publishing market faces that challenge today!

"The pressure is on to publish much more, but with fewer internal resources" says top executives at Penguin Random House and HarperCollins..

The simple response by publishers in the past decade has been to outsource and in many cases offshore basic editorial and production tasks to third-parties.

Historically, a solution but for many publishers one which has had varying degrees of success.

The rise of editorial and production resource pools in India, China, Asia, eastern Europe and now central and south America with cheaper hourly rates has met that offshore demand for lower editorial and production costs. But at what price?

Clearly, the ability for offshore vendors to provide language skilled editorial and production workers can be a limiting factor. As John Pettigrew of Futurepoofs comments “I’ve rarely seen British, or American-English proofreading moved overseas where the product is for a domestic market - a fluent speaker in one country has a different dialect to a fluent speaker in another country and it’s hard to work successfully across those barriers.”. Similarly, John Bond of White Fox makes a similar observation for editorial copy-editing.

Another issue can be  that for some publishers the time-zone turnaround times for editorial copy, make offshore outsourcing tricky. US publishers often prefer to outsource to the Far East rather than India due there being a 10 hour time-zone difference with the east-coast US publisher. Perhaps not so much of a concern with suppliers working on a 24-hour basis.

More recently, with the currency fluctuations in the UK & US markets forward planning on budgets has led to variable costs certainly for UK publishers.

It's not unusual for production work to take 24 to 72-hour to turnaround. In the US there is additional pressure of Donald Trump's political spectre urging corporations to onshore work, which had previously been sent offshore.

Increasingly, in a faster moving markets such as digital early trade publishing looking to promote volume. Using appropriate production infrastructure to get the job done locally with existing trained staff is an approach where the pressure on quick turnaround, for digital first publication, offshoring can lead to a higher level of errors. As resources in publishers become stretched, loss of control in the editorial production process can led to increased stress and missed internal publishing targets.

The answer for publishers must be to automate their existing editorial and production workflows!

How can Technology HELP?

Perhaps the very first thing a publisher receives is the author's manuscript.

The processing of the manuscript into a consistent publisher friendly format of known layout is perhaps the first editorial rung on the publishing ladder. A step which for many editorial assistants and editors, can be frustratingly time consuming process.

The Society of Authors published a report a few years ago on "how authors write their manuscripts". The report noted that 92% of authors wrote their manuscripts using Microsoft Word. After that fact there is virtually no common layout adoption other than the author naturally breaking their story roughly into chapters.

Every author's manuscript is about their story first and foremost. The effort author's place into manuscript formatting, spelling and grammar varies wildly. The publisher needs to prepare the manuscript to a consistent layout which is usually the job of their first tier of editorial production before the manuscript can be moved to rounds of copyedit corrections and then proofreading before final internal approval to print.

Depending on how good your MS Word styling skills are, or how many MS Word macro's you've constructed will determine the time taken to process this first step. As a rough rule-of-thumb a most current bestsellers manuscripts are 120,000 words long (that translates to approximately 280 book pages @ a font point size of 10.5). The time taken to manually style in MS Word can be between two to five hours depending on volume and original state of the manuscript.

During the early editorial interventions between the author and the publisher the manuscript may revert many times, but if we say that the manuscript is restyled four times then the editorial team can easily spend up to 10 hours on manuscript styling or putting that into UK costs @ £60 per hour ($75 per hour in the US) translates to £600 per title in editorial time. For a larger publisher processing say 700 trade titles per year that equates to staffing budgets in excess of £500,000 ($750,000) per annum.

If the publisher is under pressure to publish more titles lets say an extra 100 titles per year, then just in manuscript handling that is another £60,000 ($75,000) per annum, or more importantly another editorial member of staff; or more work for existing staff.

Today's Technology:

Easypress has been working with Bonnier, Endeavour Media plus several major publishers developing a SMART solution to act as a productivity tool for fast and accurate manuscript styling. This tool starts from the premise that the software understands how a book requires to be ordered and structured into Parts, Chapters, Sections, Paragraphs, Sentences, and Words.

The software then allows a range of approved styles to be applied at each level across that manuscript. A user then needs to load the MS Word manuscript and a guided process will set the user through each element of the manuscript applying decisions to every identical other part in the manuscript on an initiative basis.

What do we know about book publishing?

Firstly, the majority of books tend to fall into fixed layouts or print formats. Although the format names vary across publishers’ formats such as "B" Format, Demi, Royal, etc. all refer to the typical trade formats we purchase in Barnes & Noble, Waterstones, airport book shops.

Secondly, the structure of the book hardly varies from a "frontest section" consisting of the Title Page, Half-titles, Copyright, Dedication, Chapter Title list, etc. The body-text from the manuscript and a "backest section" which may consist if "about the author", other books by the author, marketing inserts, bibliography, references, etc.

Finally, within the "body-text" section the manuscript is hierarchically structured into parts, chapters, sections, sub-sections, first-paragraphs, paragraphs, sentences, words. In publishing, each of the above components are assigned a style to denote the use of fonts types, font faces, font styles, font colours and similarly characters have attributes to describe how the text will look on either the printed page, or digital layout.

This hierarchy exercises all publishers and very often defines the consistent look and feel of the imprint from the publisher. Perhaps the best examples of this are John Wiley's "Dummies", or Penguin's "Classic" series. Many different books, content, subjects but all laid-out and published in a manner which is instantly recognizable to the reader.

Using this repetition is very helpful for processing books accurately.

Easypress uses an advanced rules-engine to separate book design from structure; structure from layout; and layout from content. In this way, all well designed books can be dissimulated for conversion from one digital format to another; or can be reassembled to construct a desired layout for print or digital publishing.

Easypress' Atomik ePublisher delivers the following features:

  • Automated and consistent styling of your MS Word manuscript.
  • Automatic ingesting and laying out of your manuscript in Adobe InDesign.
  • Automated typesetting of the Print book removing widows, orphans, correcting tracking; and correct page and chapter endings.
  • Composition of the final book layout with frontest and backest copy, inserts and correct title pages ordered.
  • Automatic eBook to EPUB3 production with intergrated accessibility built in to the eBook.

All delivered in under an hour.

On top of this the Atomik ePublisher comes with automatic publishing repository in and open architecture with high-levels of security and ability for late-stage on-page editing for late edits.

What is the result?

The result is simple. The time taken to style a MS Word manuscript; or auto-typeset a book; create a digital eBook can be reduced from hours to minutes.

In a recent test across 30 titles for one publisher 60 hours of traditional manuscript styling (approximately two hours per manuscript) was reduced to 20 minutes per title, or a wopping 80% reduction in time taken to style 30 titles.

Productivity is the key to producing more, with less resources.

To achieve this publishers will need two consider two things? They'll need a change of thinking to look at how to embrass change and to invest in the effort to adopt advanced productivity tools that meet future business needs.

In return, technology suppliers need to develop technology which doesn't demand that users are forced to alter their workflows to meet the demands ot the technology.

Successful automation should ensure that publishing workflows still work the way they've always worked but are faster, carry greater precision and ensure less impact on publishing resources.

James Macfarlane

(I would welcome your views at

Get all the latest blogs straight to your inbox!