PhantomJS logo What is PhantomJS?

PhantomJS is a free, scriptable, headless, stand-alone Web browser that can be used to add screen capture and Web page conversion to your 4D systems. What does all of that mean? If you've got a live or static Web page and want to turn it into a PDF or image file for storage or distribution, PhantomJS can help. Below are a few example uses:

  • Generate a report on-the-fly, convert the results to PDF and email the results to users for off-line review.
  • Render a live Web page as a static PDF for archival or distribution.
  • Generate a Highcharts area and convert the results to PDF.
  • Convert an SVG file into GIF, JPEG, PDF, or PNG.

Background: Solving an Immediate Problem Quickly

I found out about PhantomJS from Keith White while trying to solve a specific problem under a short deadline. The original requirements came from a system that helps organize applications to a university graduate program. Data is collated from the students, various departments, outside references, testing agencies, and so on. Much of the data is delivered and stored as scanned documents in PDF format, some is collected over the Web using Wufoo & 4D and some is imported from CSV files. The ultimate goal is to produce a single PDF document for each applicant with all of the scans and other data merged together with an introductory cover sheet. In most cases, converting 4D data to PDFs is pretty easy, at least under OS X. There are lot of good options involving various combinations of 4D's native printing tools, including SET PRINT OPTION, Print object, Print form, OBJECT GET BEST SIZE and so on. The hard part is the cover sheet. It needs to include short data fields (numbers, strings, and dates), variable sized text fields, and rows from about a dozen related tables. The deadline was too short for me to master the intricacies printing multiple variable height objects of different kinds with native 4D code.

Being comfortable with converting 4D data into HTML, I couldn't stop thinking about a Web solution. HTML is easy to code and it's a natural medium for rendering multiple variable height elements. Unfortunately, a Web solution is out of the question when the final report must be a PDF. The dynamic report is the cover page for a larger final document that combines the dynamically generated cover sheet with a collection of generated and stored PDFs. Without a native way to convert HTML into PDF producing an HTML report is a dead end. Yes, 4D has an embedded WebKit browser, but no printing features are exposed or supported. (WA PRINT AREA or WA AREA TO PDF commands would be fantastic!) Enter PhantomJS to the rescue!

Advantages of PhantomJS

PhantomJS has a stack of great features and a lot of advantages, some of the most obvious are listed below.

  • Free
  • Cross-platform
  • Stand-alone. Translation: You don't have to run an installer on individual machines.
  • Widely used. If you've got a problem or a question, you can quickly find examples and discussions dating back years.
  • Fantastically convenient, if you're already dealing with custom HTML or existing Web pages.
  • Fills a gap in 4D's command-set.
  • You do not need to know JavaScript to make this system work perfectly, although the rasterize.js and printerheaderfooter.js scripts are included in the solution. These scripts are similar but the latter supports adding a custom header and/or footer to PDFs generated by PhantomJS. Note that apart from the page number information added by printerheaderfooter.js, PhantomJS is not changing the source HTML or SVG in any of the examples.

A few words on JavaScript are worth adding here. PhantomJS is a stand-alone executable with a command line interface. You do not need to use or know JavaScript to use this tool. The sample database uses a couple of scripts that simplify setting up page dimensions, rendering images, printing footers, and various other print-related details. These short scripts take advantage of the PhantomJS JavaScript API. Even if you don't know JavaScript, you may find that many of the example scripts are simple enough to alter for your purposes. If that's of no interest to you, the scripts included with the sample database support all of the rendering and conversion features discussed here.

PhantomJS: What You Need

To get PhantomJS working with 4D, you need the following tools and commands:

The sample database includes everything you need to get started and we'll have a quick look at it shortly.

Sample Database

The sample database includes everything you need to get started with PhantomJS. The 4D database is fairly small but does not work without the PhantomJS binaries. You can download the database alone (305 KB Zip) or the database with PhantomJS 1.9.2 for Windows and OS X (16.8 MB Zip.) Within the database, you'll find a lot of useful code and some simple screens for experimenting with some of the available functionality. The following screenshots should give you an idea of the features demonstrated in the database's GUI. Internally, there's a lot more useful code beyond these demo screens.

Convert an HTML File to PDF

HTML to PDF Demo Screeen

This screen exercises the abilities of the PhantomJS_RenderFile routine. Underneath the hood, you'll find a lot of utility code for handling paths, LAUNCH EXTERNAL PROCESS, and switching between PhantomJS scripts. You can use a lot of these functions from code or a GUI without digging into the code, but feel free to explore. The sample line below shows what happens at the command line

phantomjs rasterize.js report.html report.pdf letter

The fragment below illustrates a 4D call to do the same thing:

$error_name:=PhantomJS_RenderFile ($source_html_path;$target_pdf_path;"Letter")

Convert a URL to PDF

URL to PDF Conversion Demo Screen

This demo calls PhantomJS_RenderURL to convert a URL into a PDF file. If you review the code, you'll find that it's nearly identical to the HTML file to PDF conversion system. The sample below illustrates a command line call to convert a URL to a local file:

phantomjs rasterize.js "https://en.wikipedia.org/wiki/Musk_Lorikeet" Musk_Lorikeet.pdf

The snippet below illustrates a 4D call to do the same thing:

$error_name:=PhantomJS_RenderURL ("https://en.wikipedia.org/wiki/Musk_Lorikeet";$target_pdf_path)

SVG to Image Conversion Demo Screen

This window provides an interface for the PhantomJS_ConvertSVGFile routine. This gives you access to the PhantomJS renderer which is capable of converting SVG files into GIF, JPEG, PDF, or PNG format. Take a look at the code and you'll discover that it's nearly identical to PhantomJS_RenderFile and PhantomJS_RenderURL. PhantomJS and its supporting scripts are wonderfully flexible and concise. The rasterize.js script supports conversion and capture of pages, page areas, and images. For example, below is a concise command line version of what the SVG converter window does:

phantomjs rasterize.js tiger.svg tiger.png

In the sample database, the code looks something like this:

$error_name:=PhantomJS_ConvertSVGFile ($source_svg_path;$target_image_path)


SVG Tiger

PhantomJS and Web Areas

It's tempting but mistaken to think of PhantomJS as a substitute for the (currently imaginary) WA PRINT AREA and WA AREA TO PDF commands. 4D includes a version of WebKit but it does not expose printing functionality. Consider PhantomJS as a possible Web Area to PDF option, if you're able to work within the limits of HTML. The main issue to consider is referenced resources. Web pages may be constructed dynamically with JavaScript, and are very likely to include external images, style sheets and other resources. PhantomJS can render pages correctly, but only if it has access to all of the page's resources. If a Web Area displays a remote Web page, you can't render it properly by extracting the contents with WA Get page content unless all of the external links are absolute. On the other hand, if you populate the contents of the Web Area through custom code, you should also be able to render those contents separately with PhantomJS by moving them into a location where all linked resources are discoverable by PhantomJS. By the way, there's nothing wrong with PhantomJS here, it's just the nature of HTML and Web browsers that externally linked resources need to have valid paths and that such paths can be broken easily by local copying

PhantomJS and Custom Reports

Custom reports are the whole reason for talking about PhantomJS...and so far nothing's been said about custom reports. As it turns out, not much needs to be said. Think of PhantomJS as a sort-of print driver. It doesn't matter how you create or obtain the source PDF, the rendering (rasterizing) process is always the same. With the rendering code provided in the sample database, you can produce your HTML however you like knowing that you've got a way to convert it into PDF. With all of that said, a few notes are worth adding

  • Definitely experiment with CSS print styles.
  • Notice that links are rendered as static images, not clickable objects.
  • When PhantomJS renders a page that includes external resources, such as stylesheets and images, be sure that all of the resources are accessible. In the original system, report templates and scratch files are all collected in /Resources/Reports/. If this sounds useful to you, there's code supplied in the PDF_Reports theme to help locate and render stored templates and documents.
  • Scratch files are often a necessary pain when dealing with PhantomJS and custom reports. The database includes a set of routines in the Scratch_ theme that help to take the pain away. As implemented, a scratch folder created at startup, if missing and then cleared of all stale contents at shutdown. While 4D is running, call Scratch_GetScratchFilePath to get a unique scratch file. Optionally, you can specify a directory other than the default scratch folder as the target for your temporary file. This is particularly handy with dynamic HTML reports as you can put the HTML in a scratch file into /Resources/Reports/ along with static style sheets, images, etc. for rendering. (Files created outside of the default scratch folder must be deleted manually.)
  • If you're comfortable making HTML templates for processing with PROCESS 4D TAGS, take a look at the PDFReport_HTMLTemplateToPDF routine. This all-in-one method loads a template, runs it through PROCESS 4D TAGS, and then calls PhantomJS to render the results as a PDF. This feature isn't demonstrated in the database and is included for developers with an interest in this strategy.

Limits of HTML, CSS and PhantomJS for Page Layout

Since its inception, HTML has been intended as a display and device-independent markup system. Ideally, the same page can be presented on screens of various sizes, screen-readers for the blind and other as-yet-to-be-invented equipment and interfaces. That's all to the good, but sometimes you need to print and you want your print-out to look decent. To accommodate this real-world requirement, the current CSS standards and modern browsers offer some support for printing-specific features. PhantomJS is based on WebKit and therefore supports the rendering features of WebKit. In my basic testing, external print stylesheets are recognized by the renderer but the application of print rules isn't perfect. (As a long-time FrameMaker user, tight control over page breaks and related formatting details are something I crave constantly in HTML.) As CSS, WebKit and PhantomJS are all evolving, it's worth trying out with your own styles and seeing how well you can do. If you're new to the subject of print-specific CSS, below are links to a couple of introductions:

How To Set Up A Print Style Sheet

Print stylesheet - the definitive guide

page-break at CSS Tricks

The CSS attributes to study are:

  • page-break-after
  • page-break-before
  • page-break-inside
  • orphans
  • widows

If you're having trouble getting PhantomJS to render pages as you would like, check the command line error output for lines like the following:

2014-01-27 10:29:51.929 phantomjs[4128:507] CoreText performance note: Client called CTFontCreateWithName() using name "Arial" and got font with PostScript name "ArialMT". For best performance, only use PostScript names when calling this API.
2014-01-27 10:29:54.998 phantomjs[4128:507] CoreText performance note: Client called CTFontCreateWithName() using name "Times New Roman" and got font with PostScript name "TimesNewRomanPSMT". For best performance, only use PostScript names when calling this API.
2014-01-27 10:29:55.002 phantomjs[4128:507] CoreText performance note: Client called CTFontCreateWithName() using name "Times New Roman" and got font with PostScript name "TimesNewRomanPSMT". For best performance, only use PostScript names when calling this API.

etc...

If you're generating the HTML and CSS before passing it to PhantomJS for rendering, there's no reason you can't tweak your HTML for the best results.

There's So Much More

This article and the accompanying demo only touch on some of the features of PhantomJS and its example scripts. For example, PhantomJS is used for network monitoring, automated testing, and screen capture. Take a look at the documentation and examples for more scripts and ideas. Search the Web and you'll find lots of discussions about PhantomJS from developers working in a wide range of languages solving an amazing variety of problems. If you're interested in building an automated workbench for JavaScript testing or Web scraping, PhantomJS is so popular, there are even projects built on top of it, such as CasperJS, a tool that simplifies automated testing, screen scraping, screenshot rendering and event logging. (Thanks to Julio Carneiro for pointing out CasperJS.)

Notes About the Sample Database

  • PhantomJS is a headless WebKit implementation that's normally installed server-side for automated print production, testing, and network monitoring. As such, you don't necessarily get a lot of progress feedback by default. The demo database uses 4D progress indicators to show activity but without any progress percentage.

    Progress Indicator
  • Much of the code in the demo database is carved out from a larger project. As such, it may have inconsistencies, inefficiencies and so on. Feel free to fix it, suggest changes, or adapt it as you like.
  • Explore and experiment with the PhantomJS scripts provided and others that you find or create. If you're comfortable with JavaScript, you can create a single print handling script that allows for dynamic header and footer contents.
  • Please share anything great that you discover or add.

The following picture of the Explorer Home page for the demo database should give you some idea of the overall range of code included:

4D Explorer Home Page

Notes and Limits

The notes and samples included here are more of an on-the-run demo than a comprehensive article. I got everything working fine for my situation but you'll very likely need to build and tweak on top of the starter code included here. I've added, and in some case repeated, details to keep in mind if you apply this write-up to your own systems.

  • LAUNCH EXTERNAL PROCESS behaves slightly differently between Windows and OS X. Within the sample database you'll find a wrapper method named System_LaunchExternalProcess that simplifies calling LAUNCH EXTERNAL PROCESS. Check the method for notes about the method's optional features and undocumented behaviors of LAUNCH EXTERNAL PROCESS.
  • This code was built and tested under 4D V13.4. There's nothing that's specifically V13 about this code and it's likely to works in earlier and later versions of 4D with few changes.
  • My task was to generate HTML and then render the results. If you're loading live pages, you'll likely have to adjust timeout settings in the support scripts to allow the entire page to load before you render the page. Otherwise, you may get a rendering of an incomplete page. (If you're still having trouble with incomplete loads, check on line for discussions of loading and rasterize.js.
  • HTML and CSS still aren't great at the concept of a "page", even when styles are created for printing. Hiding objects before printing works well, as does resetting specific margins. Page break and widow/orphan control appears more unreliable. Chances are, printing support will improve with future versions of WebKit so it's possible that stylesheets created today will produce better output in the future without rework.

PhantomJS Installation Notes

PhantomJS is a stand-alone file and no special installer is needed. This is a big help as you can include the PhantomJS program file in a built application, in the Resources folder under 4D or 4D Server, or anywhere else that you prefer. The screenshot below illustrates where the sample database expects PhantomJS to be located:

PhantomJS location

In your own application, you can place PhantomJS and any associated scripts and resources wherever you like. If you're starting from the code in the sample database, update the PhantomJS root folder in PhantomJS_GetDirectoryPath and the path to the executable in PhantomJS_GetCommandPath. If you change the subfolder names or organization, also update PhantomJS_GetScriptPath. For anyone that works with the supplied PDFReport code, the expected template folder path can be adjusted in PDFReport_GetDirectoryPath.

If you are deploying under OS X, take a look at Making Friends with LAUNCH EXTERNAL PROCESS on OS X for hints and details about telling OS X where to look for executables, strategies for dealing with path and POSIX problems , and resetting permissions under 4D Remote.

Resources

Let me Know

I'd love to hear from other people integrating PhantomJS into their 4D apps. Feel free to send me tips of your own or questions and I'll reply, as my schedule allows. I'm also a coder for hire if you need a hand.