In an earlier blog, I discussed the rise of machine learning technology, which I believe could be used to aid and power advances in global digital publishing.
There are three classes of machine learning tools and technologies, each have the potential to benefit both book publishers and readers alike.
In this article, we will explore the first class of machine learning, which is at the core of Natural Language Processing (often referred to as NLP) and how that class of advanced technology can be applied to digital book publishing.
Sources (Google) put the global number of commercially available print and digital books at over 130m, with Amazon claiming a commercially available list of over 32m titles.
There were 675m books sold in the US in 2018 and more than 75% of those books sold online were via Amazon netting a revenue of approximately $5.4bn. Neilsen Bookscan data tells us that there are approximately 325,000 new edition book titles each year in English.
Audio books which have been around for decades, have recently really begun to take off. In 2019, the Audio Books market accounted for 60,000 new titles in the USA, with $1.2bn in revenue, up 25% over 2018. Tiny numbers compared with the total number of new print and eBooks in the same period.
The labour intensity and cost in producing Audio Books severely limits the number of new Audio books coming to market compared to the equivalent number of print books, or eBooks in the same year.
It is in the context of Audio Book production that I'd like to present a use case for the adoption of Natural Language Processing to take Audio Books to the next publishing level.
Perhaps one of the most exciting uses of natural language processing in the digital publishing industry relates to the development of advanced augmented audio books.
Historically, audiobooks have been produced by creating a digital audio recording of a reader narrating a book, which is then converted to an audio book sold alongside both the print and digital eBook for purchse and downloading to readers.
The drawback of mass-producing audiobooks is the effort and complexity of audio production, with typical audio recordings being $1,500 per hour, or $2,000 for readings of 50,000 words. It is not uncommon for an audio book to cost $5,000 to produce and if a known actor or multiple actors are used to narrate the audiobook, the costs can run into $10,000's.
Text-to-Speech first appeared in the 1950s with automated voices. However, early attempts sounded closer to science fiction than reality due to the nature of their robotic sound quality. Perhaps personified by Professor Stephen Hawkins text to speech voice.
The rise of the computer game industry in the 1990s drove the next wave of computer-based speech processing and more recently, the sector has been led by the likes of Amazon and their quest to sell us smart hardware under the banner of your personal AI Avatar, or Alexa. The vast number of technology-driven developments pushing towards authentic digital human speech also include avatar-based digital translator technology from Google and Microsoft.
At a high level, natural language processing is used to break up text into processible blocks, usually at the sentence level, with increasingly sophisticated software to manage periods, numbers, symbols, and phonic issues ,such as gauging the correct pace and pitch of narrated speech. The output from that procedure is then put through a digital signal processor to convert the resulting content to audio.
A great example of this is Amazon’s AWS Polly, which clearly demonstrates how recent advances in natural language processing and digital signal processing could make text-to-speech extremely affordable, whilst delivering a level of audio quality that ensures the book is easy to follow and understand.
The future NLP technology will allow the development of cheap, high-quality text-to-speech audio books, which will dramatically drive down the cost of audio book production, flooding the market with more audiobooks than there are available today.
Another potential use of Natural Language Processing, is.
Our available reading world has never been bigger. However, with greater choice comes problem of “selection”.
I believe the use of natural language processing to empower discovery and improve readability could enhance reader choice and, therefore, increase book sales.
One of the challenges I find as a reader is deciding which author to read next. Most readers have a favourite genre. Broadly speaking, readers know which types of books will give them the most pleasure from reading, and I am no exception. Despite knowing this, I am always left with the one question: who is going to be my next favourite author?
The problem has traditionally been well served by referrals from family and friends, and perhaps, the more old-fashioned concept of libraries and book clubs. Or simply spending time browsing Barnes & Noble and Waterstones for my next favourite author.
Of course, the book industry weekly trade magazines Publisher Weekly and The Bookseller do a good job of keeping us informed of the bestseller lists and the best new authors.
In more recent times, we have seen a raft of book promotion and recommendation sites such as Goodreads and LoveReading as well as the publishers themselves.
The simple fact is that none of these sources know what type of books I like. If we're honest, how many times have we read thirty or forty pages of a book before losing interest and abandoning it altogether? I am happy to hold my hands up to that.
Machine Learning does offer us the vision of a personal digital avatar monitoring what we read and working out our collected preferences to bring different authors to our attention when we are in the mood for something new.
The concept of a personal digital avatar (PDA) is not new; indeed, I have spent an unhealthy chunk of my software development life and quite a few million dollars trying to develop such a platform, without success.
In principle, it is not difficult to understand. I am a massive fan of Bernard Cornwell and his historical fiction. I have probably purchased every book he is written and watched every television programme and film his books have been turned into, and that has drawn my attention to other similar authors, such as Patrick O’Brian, Conn Iggulden and many more.
A PDA will have to be capable of understanding my preferences and scanning for new and unread alternatives. Netflix makes a surprisingly good attempt at ranking your preferred intake and suggesting ranked alternatives. Unfortunately, Netflix has no understanding of who I am and what my moods are.
A PDA will need to be sophisticated enough to interpret my interests and understand my pattern of behaviour.
It will need to have a very sophisticated natural language processing capability, which is constantly being refined and reiterated. It will require both speech-to-text and text-to-speech capabilities and as it is an avatar, it will need to be mine.
The successful generation of PDAs will have many uses and will help us interact with the increasingly fast-paced digital world around us. In turn, this will allow us to make better choices, with more informed outcomes. Most importantly, a less wasteful life will allow us to make the most of our precious downtime.
If I can think about it, what is stopping you starting a discussion?
I would appreciate your views.
Try our print and digital publishing platform for free today.
Contact us and we will provide the best solution to suit your digital publishing needs.
Get all the latest blogs straight to your inbox!