Table of contents
Your Audiobook Will Be Less Popular After a Few Months
Visitor traffic to most audiobooks in the Internet Archive (IA) Librivox collection is impressive in the first weeks after an audiobook appears, unless it first appears without its cover. (The level of traffic to audiobooks at Librivox.org is unknown, because Librivox management does not collect any metrics). But traffic to IA audiobooks falls off sharply in the months after audiobooks are ‘published’.
Visitors to the collection probably sort for the most recent or the most popular audiobooks, and browse from the top of the list. But new audiobooks soon sink deep into the collection of 19,000 books, and won’t appear in these casual browses. Visitors also search for well-known titles and authors. However, because our audiobooks are a century old and more, the authors and titles of more than 9 of every 10 audiobooks are obscure or forgotten.
Searching by Subject in Librivox.org is Not Effective
We depend on subject searches by site visitors to circulate most of our audiobooks as they age in the catalog. Unfortunately the subject search function in Librivox.org doesn’t work well. One-word searches must be used that match single-word topics in books. Very often different Book Coordinators used different words or phrases for the same subject, so searching for all the audiobooks on a subject requires the user to guess all the words, including singular and plural forms, that may have been used.
If a visitor tries to use a more precise search term of two or more words in the Librivox.org search engine, that triggers a search for each of the words separately, and the results are then combined together. So, for example, one would expect that a search for “adventure fiction” should get narrower, more accurate results than simply “adventure”. But Librivox.org instead searches for ‘adventure’ and ‘fiction’ separately, and adds the results of both together, providing hundreds of marginally relevant results.
With no data available about visitor searches or traffic on Librivox.org, we have to guess at how much success visitors have. Because of these limitations of the Librivox.org search engine, it seems probable that few visitors find most of the available audiobooks on their subject.
Librivox management has recently recognized that subject searches work much better in Internet Archive than in Librivox.org, and inserted a message on the Librivox ‘Advanced Search’ page encouraging visitors to take their search to Internet Archive. I also encourage Book Coordinators and Soloists, when writing their ‘Topics’ and ‘Summary’, to have in mind the Internet Archive’s search engine capabilities. I have described them on Search Internet Archive for Librivox Audiobooks at this Century Past website.
Librivox Genres
Librivox was designed to search for genres, and has a list of over 140 genres, accessed from the Librivox.org front page. Even with so many genres, many are a bit broad and now hold hundreds of books. A user must review search results one screen at a time, with each screen showing 25 books, so reviewing a large group of search results is cumbersome. Even so, Book Coordinators should be sure to fill in appropriate genres in their book, so it will be found by the unknown number of site visitors who choose to use this tool.
Two Ways to Have Your Audiobook Heard More Often
Readers do most of the work creating audiobooks, but it is the Book Coordinator’s responsibility to ensure that the audiobook is found by as many individuals and audiobook website managers as possible. Do both of these things to enable the long-term popularity of an audiobook:
- Choose a book that is popular in its text version.
- Write topics and a summary that will work well in visitors’ searches at Internet Archive.
I’ll cover below how to accomplish both of these approaches.
Find a Popular Public Domain Book for Your Audiobook
The first step in recording an audiobook that will be popular among listeners is to select a book that is already popular in its text version. If you’re determined to record a particular title, feel free to do so. However, if you have a subject in mind but you’re flexible on the title, you can research which public domain books in that subject already get a high number of views. Keep in mind of course that when a visitor views a book, that doesn’t necessarily mean they are reading it. We have no data on how many books or audiobooks are actually read.
Librivox is strict about only releasing books that are in the public domain. Here’s the guidance from the Librivox wiki (from “Determining Public Domain Status:”):
“Published works that fall into one of the following categories may be included in the Librivox catalog:
- Works published 96 years ago or earlier (the copyright has expired in the U.S. on these works),
- Works authored by the U.S. Government (these works are not eligible for U.S. copyright protection),
- Works which Project Gutenberg has determined are in the public domain.”
Where to Research the Popularity of Books
At The Internet Archive:
- To get viewing metrics for Internet Archive books, go to ‘All Books‘ (archive.org/details/books). In the left column find ‘Year‘ and click ‘More …‘ at the bottom of that section. The ‘Select Filters‘ box opens, and at the top is a selector for ‘Sort by:’ Choose ‘Year‘. Most people will want only the years before 1928, as you need to be confident the books are in the public domain. You can select as many years as you wish, and then click ‘Apply Filters‘.
- 2. When you’re back on the ‘Books‘ page with the results, type into the ‘Search‘ box subject:, followed immediately (without a space) by your subject in quotes. Like this: subject:”Africa exploration“. Make sure ‘Search metadata‘ (under the subject line) is selected. Click ‘Search this collection.’
- 3. When you search for an author, use creator: instead of ‘author’. You can also search for title:
- 4. When analyzing the results, first choose Sort by: Relevance. If you’re using a PC rather than a phone, you can hover your cursor over books and see their Topics, which are mostly Library of Congress Subject Headings.
- 5. Remaining on the page with search results, change the Sort by: to Weekly views and look at the number under the eye icon. Beware that occasionally that number will actually be all-time views. ‘Weekly views’ only tracks views for the previous seven days, so it may be misleading, as you’re interested in the long-term average.
- 6. Sort by All-time views. Be cautious with this number too, as books that have been on the Internet Archive a long time have an advantage. Open a book entry and look in the bottom right corner of metadata to find the upload date.
- 7. The Internet Archive often has multiple copies of one title. To know how popular a book is, you need to look at every copy. Do a title search (title:”type title here“) and add up all the views for all copies and versions of the book.
At Project Gutenberg:
Project Gutenberg is a popular site devoted to public domain books. The collection is small compared to Internet Archive, with only 70,000 books. About 12,000 of their books are already on Librivox.
To get traffic metrics with Project Gutenberg, try their Advanced Search page ( www.gutenberg.org/ebooks/ ). Its easy to search by subject. Try your subjects in the ‘Subject‘ field or select an ‘LoCC‘ (Library of Congress Classification). Search results are listed with author and title, and can be sorted by either.
Choosing an author gives you all the books at Gutenberg by that author. Choosing a title allows you to see metadata, including the number of downloads in the last 30 days.
With these searches of Internet Archive and/or Project Gutenberg, you should be able to find a popular title. The next step is to find search terms to use for your audiobook.
Selecting Topics for Your Audiobook
You have now selected a title to record, and need to fill the metadata into the ‘Project Template Generator‘. The fields of that form that are relevant here are Author, Topics, and Brief Summary. Because Internet Archive’s search capabilities are better than Librivox’s, Book Coordinators (BCs) should keep them in mind as they fill in these fields, as this could make a huge difference in the long-term circulation of your book. Here are some tips:
Author Field
Usually the full name of the author is used here. Its a good idea to check whether the author’s name sometimes appears in different forms; e.g. with or without a middle name, or initials in place of the first and middle names. Put the most frequently used version in the author field, and put any commonly-used alternate versions in the Summary box.
In books with many authors, such as short story or poetry collections, put all the authors’ names into the summary, to help visitors find them in searches.
Topics Field
Internet Archive Info says you can add up to 10 terms in the Topics’ field in a book’s metadata. That may seem excessive, but you’ll see below why its a good idea to use many topics.
The first two mandatory topics will be ‘librivox‘ and ‘audiobooks‘. (That’s ‘audiobooks’ (plural)). So you now have only 8 topics remaining.
Identify the form of the book. By “form”, I mean categories like poetry, short story, nonfiction, mythology, literature, etc. Don’t make a topic of this yet.
Identify the subjects that you want people to search for. Examples; romance, adventure, Dutch history, law enforcement, etc. Keep in mind that narrow subjects are more accurate, but will be missed in broader searches carried out by potential readers. Most works fit well into two or more completely different subjects, try to identify all the major subjects.
Look up your book in ‘Google Books‘. Go to the metadata page and scroll down to ‘Common Terms and Phrases’. This word cloud can help you identify the main subjects. So can chapter headings.
Choosing a short list of subjects for Topics will take some work. You can list many topic phrases, using the best ones for your Topics field. When someone searches specifically for subjects, with subject:”your subject here”, that search will only see your topics. Put the others in the Summary field, where a general search or a search with description: will catch them.
Multi-word Topics Are Good
This recommendation will be a departure from a common practice in Librivox. Try combining the form of media with a subject into one topic, or search term. For example, use “‘travel short stories” (in quotes) instead of making two topics of ‘travel‘ and ‘short stories‘. There is no search benefit in Internet Archive for making separate terms like ‘Short Stories’ or ‘travel’ into separate topics. Searching ‘short stories‘ gets 1,571 results, and ‘travel‘ gets almost 600 results; many of them nonfiction books. These groups of search results are too big to review, but ‘travel short stories‘ is more precise, and gets only 37 results.
‘Literature’ gets over 4,000 results, and ‘fiction’ 3,400, so also make topics that put these terms together with the main subject, rather than listing them separately.
No single search will turn up all the books in Librivox on that subject, because there are often several words used in audiobook metadata for the same subject. Try to determine, for your main subjects, the various search terms commonly used. You can do this by searching for subjects in the IA Librivox collection, then hovering your cursor over books in the results to see the topics in books similar to yours. Write down all the good ones.
For each subject in your book, use one or two commonly-used phrases as Topics, and put any good variants in your Summary box. Try to select search terms broad enough that they will be used often, but not so broad that there are too many titles in the results for a visitor to easily review.
When you finish this you should have several terms to use as Topics. However, don’t use up all of your 8 allowed Topics because there is one more step; LOC Subject Headings.
Library of Congress Subject Headings
If you researched book popularity in the Internet Archive at the section above, you may have noticed the librarian terms that the Internet Archive often uses for topics in its collection of millions of books. These are mostly Library of Congress (LOC) Subject Headings. LOC subject headings have the big advantage of employing a single standardized term for one subject. If that LOC subject heading has been added consistently to books, searching with it will capture all the books in a collection on that subject.
The disadvantage in using LOC subject headings is that they are not commonsense search terms. People aren’t familiar with them, so they need prompting. Internet Archive does that prompting in their book collections by providing them as highly visible links in book ‘Topics’. If a visitor uses a commonsense term in the search line, the books that appear in search results show the LOC subject headings, which the visitor then clicks for further searches. Its a good system.
I think Librivox should move toward adopting the same system. If you add some LOC subject headings to your topics field now, your book will be easier to find by Librivox audiobooks users in future.
Of more immediate benefit to you, putting LOC subject headings among the topics in your audiobook will get it into results of searches by hundreds of thousands of regular book users at Internet Archive.
Project Gutenberg, another major free book outlet, has also adopted LOC Subject Headings. The Gutenberg collection consists entirely of public domain books, like Librivox, and over half the existing Librivox titles are in Gutenberg. LOC subject headings have been assigned to each book, and you can use Gutenberg to find subject headings for your book. Search for your title there, and scroll down through the book’s metadata to see the LOC subject headings that were used. If you don’t find your title, use their ‘Advanced Search‘ page (www.gutenberg.org/ebooks/) to look for other books on your subject.
When your subject search turns up a list of book titles, click on the title of a book similar to yours and scroll down to see the multiple LOC subject headings assigned to those books. If you find one that looks right, click on it to see how many books Gutenberg has on that subject. When you’re trying to decide if that subject term is too broad or too narrow to use, remember that Gutenberg has 70,000 books, more than three times the number now held by Librivox.
You can also find LOC subject headings by doing a title search at Open Library (openlibrary.org). They have entries both for books they have and books they don’t hold. Focus only on the books they hold in their collection.
When You’re Finished with Topics …
At the end, you should have 10, or nearly 10, topics. Namely: ‘librivox’, ‘audiobooks’, about 5 or 6 commonsense search terms (preferably multiple-word terms), and LOC subject headings for the remainder. And as mentioned above, if you have more than 10, add the extras to the Summary field.
If you add all these topics to the Topics and Summary fields, your audiobook has an excellent chance of being found by interested listeners for years to come.
See the Number of Views Each Day of any Librivox Audiobook
You can create a traffic report from Internet Archive for any audiobook for every date since Jan 1, 2017, and showing the total number of views before 2017. Here are the steps for making the report and putting the data into a spreadsheet for analysis.
- In the Internet Archive Librivox collection (https://archive.org/details/librivoxaudio) click on any book, including your own.
- In the metadata (lower) portion of the book’s page, scroll down to ‘Identifier‘. Copy the identifier.
- Paste the identifier at the end of https://be-api.us.archive.org/views/v1/long/ in the address bar of your browser and enter.
- You’ll initially see a screen full of numbers. Wait 5 seconds and you’ll then see two columns of numbers and dates, starting with ‘0‘ and ‘2017-01-01‘. Make sure your screen is set to ‘JSON’ and ‘Expand All’.
- Copy and paste both columns into a spreadsheet, down to the most recent date. You’ll have more than 2,620 rows. Then (switching back to the report screen) roll them up by clicking on ‘days‘.
- Now you should see two columns of data, with headings ‘non_robot:‘ and ‘per_day‘. This is the main data you want, but before you copy it, we have to eliminate some data that is below this section.
- Scroll down past line 2620. These two columns of data end, followed by another set of headings, and the ‘per_day‘ column starts over at 0. There’s another heading of pre2017. Roll that up (click on the heading.) If the book was published before 2017, there is no day-by-day account available before 2017, but you’ll find a pre-2017 total here for the number of views. Make a note of that total.
- Then there’s a heading for ‘robot‘. Click it to roll that up.
- After ‘robot‘ you’ll see ‘unrecognized‘. These numbers appear to be valid. If you want accurate data, you need to add these to the non-robot figures, in the same way as in step 10. However, they are a fraction of the total, so you can roll this up if you’re mainly interested in seeing trends.
- You should be now viewing two columns of data for ‘non_robot‘ and ‘per_day‘. Copy the columns and paste into your spreadsheet, aligning day 0 in the same row as day 0 that is already in your spreadsheet.
- You should now have 4 columns of data in your spreadsheet with more than 2,620 rows. Two columns are duplicates, that show the day number assigned to each date. You can delete one of those columns, leaving you with 3 columns: a column showing a date, a column with a day number assigned to that date, and the number of views that day.
Congratulations! You now have your data where you can analyze and manipulate it as you wish.
Six Hares and a Turtle – A Story about Book Circulation
I used the Internet Archive’s report format (explained just above) to get viewing data for books published in 1 October and 3 October, 2017. The reports show the number of views every day for all the books. I put the data into a spreadsheet to see the trend in views over time.
The first 6 books followed a general trend, and the last was an exception to that trend. This sample of books is not large enough or random enough to accurately reflect the trends for all 19,000 books in the collection, but the data is of interest nonetheless.
6 Books – The Hares | Avg Views | % of 2nd Mo. Views | Views – The Turtle | % of 2nd Mo. Views |
---|---|---|---|---|
1st Week after Publication | 798 | ——– | 227 | ——— |
2nd Month ” “ | 1151 | 100% | 3074 | 100% |
7th Month ” “ | 555 | 48% | 1844 | 60% |
12th Month ” “ | 121 | 11% | 1132 | 37% |
24th Month ” “ | 77 | 7% | 936 | 30% |
36th Month ” “ | 56 | 5% | 756 | 25% |
48th Month ” “ | 37 | 3% | 552 | 18% |
60th Month ” “ | 37 | 3% | 558 | 18% |
In their first week after publication, these 6 books were at their peak in popularity, averaging 113 views per day. In their 2nd full month after publication they averaged 1,151 views, or 38 views per day. As you can see, views continued to drop steadily until settling at 37 per month, or just over 1 per day, within four years.
While this is a very small and unrepresentative sample, the results are consistent with info provided by the SEO site, Ubersuggest. I learned from it that 92% of books in the Internet Archive’s Librivox Collection get no more than 1 view per day.
Why do Views Decline So Much?
It might seem normal to us for books to decline in popularity a year or two after they first appear. When new books are published, there are reviews in newspapers and magazines, books appear on the ‘New Books’ shelf at libraries, and if you socialize with readers, you hear about new books. All that ‘chatter’ declines with time, and books gradually become less popular.
But none of those factors apply to Librivox books. They are virtually all over 100 years old when Librivox releases them. The great majority of titles and authors are today unknown, and their ‘publication’ on Librivox or Internet Archive doesn’t generate any public buzz.
It seems clear that the initial surge in popularity is a result of many ‘fans’ of Librivox books watching for the release of new books, and browsing from the top of the list when it is sorted by date of publication. When people open a book entry to learn more, that is counted as a ‘view’, regardless if they listen to the book. As books stay in the catalog, they sink far from the top of the list, and fewer and fewer people find them by browsing. Once books sink too deep to appear in a casual browse, circulation depends entirely on people finding them through searches.
How Some Books Buck the Downward Trend
There are many exceptions to this dismal slide to one view per day. One of the books published in early October, which I called ‘The Turtle’ in the Table above, was “Short Poetry Collection #172“. It got off to a slow start in its first week, possibly because it was released without its cover. Lack of a cover discourages people from viewing a book at the exact time when it should be attracting maximum attention. But the Short Poetry Collection had a good 2nd month, and then began to decline. However, the decline was much more moderate than with the other books. Even after five years it is viewed more than 500 times per month; 15 times the level of traffic of the six ‘Hares’.
Why do so many people still find this audiobook and listen? Undoubtedly it is because the Book Coordinator listed every poem and author in the summary, so the book continues to turn up in searches for some of the popular authors. To date, the poetry collection has had 71,000 views, and will likely continue to get thousands of views per year.