A JavaScript Executable
There’s an interesting command line browser called
Lynx. It’s an extremely useful and
versatile text based browser. In lynx
you can dump all links on a web page as
references. That’s neat.
shell
$ lynx -dump -listonly https://example.com/
References
1. https://www.iana.org/domains/example
I’ve been meaning to replicate this feature, generate a list of references and stick them at the end of each post. This would allow counting the number of links on a page, and visually or automatically checking for dead links (link rot) in the future.
- Lynx
- A fully–featured World Wide Web (
WWW
) client for users running cursor–addressable character–cell display devices — Lynx Users Guide
Let’s make a quick and dirty command line program that replicates the reference
list feature of lynx
, but to HTML
instead of text.
JavaScript with Deno Runtime
JavaScript
is the
master at traversing the DOM
(Document Object Model).
Pulling down a list of links is easy with native browser JavaScript
APIs
(Application Programming Interfaces). Deno is a versatile
runtime for JavaScript
and TypeScript that
sits at just the right primitives and abstractions that allow for a variety of
use cases. Using Deno, we’ll compile a simple command line program written in
JavaScript
that generates references. The output of my deno --version
is
1.23.0
.
shell
$ deno --version
deno 1.23.0 (release, x86_64-unknown-linux-gnu)
v8 10.4.132.5
typescript 4.7.2
File and Directory Setup
The program will be
exoference
. The source directory will contain the entry–point main.ts
, the
TypeScript compiler option manifest tsconfig.json
, and eventually a compiled
binary that assumes the name of the parent
deno
project requires virtually no boilerplate (so good).
Version 1.25 adds the command deno init
for
peak laziness.
shell
exoference/
|__ exoference
|__ main.ts
|__ tsconfig.json
Running the script inside main.ts
is usually achieved with deno run
.
shell
deno run --allow-net --config tsconfig.json main.ts
Compiling a program into a runnable
deno compile
.
shell
deno compile --allow-net --config tsconfig.json main.ts
The tsconfig.json
manifest configures compiler options for TypeScript which provides
tsconfig.json
that imports
browser DOM
API
libraries.
json
{
"compilerOptions": {
"noFallthroughCasesInSwitch": true,
"noImplicitAny": false,
"noImplicitReturns": true,
"noImplicitThis": true,
"noUncheckedIndexedAccess": true,
"noUnusedLocals": false,
"noUnusedParameters": false,
"strict": true,
"strictBindCallApply": true,
"strictFunctionTypes": true,
"strictNullChecks": false,
"strictPropertyInitialization": false,
"lib": [
"dom",
"dom.iterable",
"dom.asynciterable",
"deno.ns"
]
}
}
Specifics & Details
Since this is supposed to be a command line program, the first thing is setting up a boilerplate. The program name, version, help flags, and help function are declared.
javascript
const program = "exoference";
const version = "0.0.1";
const helpFlags = ["-h", "-help", "--help"];
const help = () => {
return `
Usage: ${program} [FLAGS]... [ARGUMENTS]...
The program ${program} shall generate a list of anchor reference links
as partial HTML output.
Command List:
${program} https://example.com Dump anchor reference links from
specified URL to HTML output.
${program} --pretty https://example.com Dump anchor reference links from
specified URL to text output.
${program} --help Show this help menu.
Version: ${version}
`.trim();
};
Import DOMparser
for DOM
traversal. The references
function accepts a
URL
(Uniform Resource Locator) and finds all elements that are anchors on the
page. The URL
API
adds implicit formatting checks to the input URL
.
The second argument resolves relative URLs
and completes them with the base.
Duplicates are removed by briefly turning the references
array []
into a set
{}
. The
spread operator
expands the items (iterables) inside the set back into an array.
javascript
import { DOMParser } from "https://deno.land/x/deno_dom@v0.1.31-alpha/deno-dom-wasm.ts";
const references = async (address) => {
const url = new URL(address);
const page = await fetch(url.href);
const dom = new DOMParser().parseFromString(await page.text(), "text/html");
const anchors = dom.getElementsByTagName("a");
const references = [];
for (const anchor of anchors) {
references.push(new URL(anchor.attributes.href, url).href);
}
return [...new Set(references)];
};
The html
function returns partial HTML
(HyperText Markup Language) to the
console as an
ordered list of
references.
javascript
const html = (references) => {
console.log("<ol>");
for (const reference in references) {
console.log(`<li><a href="${references[reference]}">${references[reference]}</a></li>`,);
}
console.log("</ol>");
};
The pretty
function replicates the -dump
and -listonly
arguments of lynx
and returns a list of references as plain text. Command line flags allow
switching to this function.
javascript
const prettyFlags = ["-p", "-pretty", "--pretty"];
const pretty = (references) => {
console.log("References", "\n");
for (const reference in references) {
console.log((parseInt(reference) + 1) + ".", references[reference]);
}
};
The main
function handles switching between each case by reading the first,
and second arguments along with the argument length. The function switches based
on the helpFlags
and prettyFlags
array in addition to the argument position.
typescript
const firstArgument: string = Deno.args[0];
const secondArgument: string = Deno.args[1];
const numberOfArguments: number = Deno.args.length;
const main = async (numberOfArguments) => {
if (numberOfArguments === 0) return console.log(help());
if (helpFlags.includes(firstArgument)) return console.log(help());
if (prettyFlags.includes(firstArgument)) {
if (new URL(secondArgument)) {
return pretty(await references(secondArgument));
}
}
if (new URL(firstArgument)) return html(await references(firstArgument));
return console.log(`Unkown argument ${Deno.args[0]}`);
};
main(numberOfArguments);
This is good enough to compile the
78MB
(megabytes) on my
x86_64
(64
bit version of the x86
instruction set)
Linux laptop.
Conclusion
This might be a very simple starting point, but the opportunities for more features are endless. Filtering, automatic link rot detection, and caching the reference sources are just a small subset of possible features.