I don’t do Microsoft Windows. Android and Linux are My Things. I subscribe to ARRL to get the digital version of their excellent QST magazine. Unfortunately, their support for offline reading in Linux is zero and their Android app is so poor as to be unusable. So I want to download QST in PDF so as I can read it easily offline on my desktop and mobile devices (Ubuntu Linux and Android respectively). ARRL doesn’t make that easy for you, but I have the solution.
Prior Art
For Windows users I found this article which describes a Python 2 script for converting a downloaded and expanded Adobe AIR version of QST Magazine into PDF. Unfortunately that combines two of my least favourite things, two things that I would gladly see nuked to oblivion tomorrow: Microsoft Windows and Python 2.
Linux Friendly Development
I thought I might as well get the script to do as much work for me as possible – after all, that is what computers are for, right? Firstly I Python 3- and *[ui]x-ised it (no fixed paths to executables, no back-slash path separators, etc.). I also tidied the code a little, removing things like global variables (shudder) except for configuration. Then I got the script to be responsible for downloading the AIR archive, unzipping it, doing all the processing as per the previous script, and then cleaning up all of the rather humongous intermediate cruft after itself. I also fixed a ‘feature’ where the script used an infeasible amount of memory when making the PDF from bitmaps. Finally I fixed up the creation of bookmarks in the final PDF. Here’s the result:
"""
May 2012 by Stephen Genusa http://development.genusa.com
October 2014: Modifications by Freakyattic.com
April 2015: Linux-ification plus enhancements by marcusjenkins.com
Written for Python 3.4
Usage:
python3 nxt2pdf.py
This script is built on the shoulders of giants - you will need to
apt-get install the following packages for Ubuntu:
unzip
swftools
ghostscript
imagemagick
pdftk
wget
python3
"""
import glob
import os
import re
import shutil
import sys
import xml.dom.minidom
tmp_dir = 'nxttmp'
def get_text(nodelist):
# Transform nodelist into text if TEXT_NODE elements exist in nodelist
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc.append(node.data)
return ''.join(rc)
def handle_page(page, page_number):
# Process individual pages
t_filename = page.getElementsByTagName('file')[0].getAttribute('value')
t_fullpath = tmp_dir + "/nxtbook/" + t_filename
command = 'swfrender -X2880 {} -o {}/Filespage{:03d}.png'.format(
os.path.realpath(t_fullpath), tmp_dir, page_number)
os.system(command)
def handle_pages(pages):
# Convenient "for loop" to loop through page elements
page_number = 0
print('Converting {} pages to bitmaps...'.format(len(pages)))
for page in pages:
page_number += 1
if page_number == 1:
# Skip the first page of QST which is always a garbage flyer
# these days
continue
sys.stdout.write('.')
sys.stdout.flush()
handle_page(page, page_number)
print()
def read_nxt_book_xml_file(strFileName):
if os.path.exists(strFileName):
t_dom = xml.dom.minidom.parse(strFileName)
t_contents_items = t_dom.getElementsByTagName('contents')[0].
getElementsByTagName('item')
print("Creating bookmarks file...")
fh_bookmarks = open('{}/bookmarks.txt'.format(tmp_dir), mode='w')
for item in t_contents_items:
t_bookmarktext = item.getAttribute('text')
t_bookmarktext = t_bookmarktext.replace("&", "&")
t_bookmarktext = re.sub(r"'", "'", t_bookmarktext)
t_bookmarktext = re.sub(r"’", "'", t_bookmarktext)
t_pagenum = item.getAttribute('folio')
t_pagenum = t_pagenum.replace("Cover", "")
fh_bookmarks.write("BookmarkBeginn")
fh_bookmarks.write("BookmarkTitle: {}n".format(t_bookmarktext))
fh_bookmarks.write("BookmarkLevel: 1n")
t_pagenum = int(t_pagenum)
if t_pagenum != 1:
t_pagenum += 2
fh_bookmarks.write("BookmarkPageNumber: {}n".format(t_pagenum))
fh_bookmarks.close()
t_booktitle =
t_dom.getElementsByTagName('book')[0].getAttribute('title')
t_booktitle = t_booktitle.replace('+', ' ')
handle_pages(t_dom.getElementsByTagName('page'))
return t_booktitle
def create_pdf(book_title):
print("Converting the bitmaps to PDF... ")
# Process one page at a time so as not to go into an out-of-memory
# melt-down
bitmap_files = glob.glob('{}/Files*.png'.format(tmp_dir))
for bitmap_file in sorted(bitmap_files):
output_file = re.sub(r'(^.*?).png$', r'1.large.pdf', bitmap_file)
os.system('convert {} {}'.format(bitmap_file, output_file))
# Shrink the PDF
input_file = output_file
output_file = re.sub(r'.large.pdf', '.pdf', input_file)
command = 'gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 ' +
'-dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH ' +
'-sOutputFile={} {}'.format(output_file, input_file)
os.system(command)
os.unlink(input_file)
os.unlink(bitmap_file)
print("Stapling the PDF pages together...")
output_file = re.sub(r' ', r'_', book_title)
output_file += '.pdf'
os.system('pdftk {0}/*.pdf cat output {0}/{1}'.format(tmp_dir, output_file))
print("Adding bookmarks to PDF...")
os.system('pdftk {0}/{1} update_info {0}/bookmarks.txt output {1}'.format(
tmp_dir, output_file))
shutil.rmtree(tmp_dir)
def download_air_file(url):
if os.path.exists(tmp_dir):
shutil.rmtree(tmp_dir)
os.makedirs(tmp_dir)
original_directory = os.getcwd()
os.chdir(tmp_dir)
os.system('wget {}'.format(url))
downloaded_file_name = re.sub(r'^.*/(.*?)$', r'1', url)
os.system('unzip -q {}'.format(downloaded_file_name))
os.chdir(original_directory)
return '{}/nxtbook/book.xml'.format(tmp_dir)
################################################################################
if len(sys.argv) != 2:
print('Usage:')
print('tpython3 nxt2pdf.py ')
exit(-1)
xml_file = download_air_file(sys.argv[1])
book_title = read_nxt_book_xml_file(xml_file)
create_pdf(book_title)
print('Done!')
To use this script, log in to to your ARRL account in your browser, go to QST and then QST Archive and then start viewing the edition you want as PDF. Then click the download button which opens another tab in your browser – on that page there is a “Download and install your offline application” link. Right-click to get the URL and use that as the parameter for this script. On my PC and internet connection it takes a good 45 minutes to download and reformat each month’s magazine.
This would all be much easier if ARRL just provided the thing in PDF in the first place. Frankly, I don’t understand why they don’t do this since they provide whole books (e.g. the Antenna Book and the Handbook) with a CD with the whole text of the book in PDF when you buy the print book. I used to be able to use the web interface to print the whole magazine to PDF which was a bit laborious, but OK. Just recently they limited the print from the Flash viewer to 60 pages. I can only conclude ARRL wants to flip the bird to their legitimate, honest, fee-paying members. That was the final straw. Right back at you, ARRL. Like those FBI warnings at the beginning of DVD’s, this DRM and proprietary e-reader nonsense just annoys and inconveniences legitimate, paying customers and does nothing to prevent piracy in any way at all.
Exercise for the reader: incorporate pdfocr to make the PDF searchable and/or write into ARRL and demand they just let subscribers download PDF’s generated from the original source files which will be a compact, searchable document for enjoying on all sorts of platforms. Not just Windows and Apple.
73 and good DX.