GitHub - ericchiang/pup - Parsing HTML at the command line

Gah, I wish this little CLI tool was maintained, because it’s amazingly useful. It allows you to take any HTML and extract the data from a given CSS selector.

I just used it to write this bookmark, as part of a new janky bash script.

#! /bin/bash
CURRENT_YEAR=$(date +%Y)
CURRENT_MONTH=$(date +%m)
CURRENT_DAY=$(date +%d)
CURRENT_TIME=$(date +%H%M)
CURRENT_ISO8601=$(date +"%Y-%m-%dT%H:%M:%S%z")
BOOKMARK_DIR="./content/bookmarks/$CURRENT_YEAR/$CURRENT_MONTH/$CURRENT_DAY"
BOOKMARK_FILENAME="$CURRENT_TIME.md"
BOOKMARK_FILEPATH="$BOOKMARK_DIR/$BOOKMARK_FILENAME"
DESCRIPTION=""
TITLE="No Bookmark Title"
URL=""
BASEDOMAIN=""

# CHECK PARAMS
if [ -z "$1" ]; then
    echo "Bookmark URL is needed"
fi

URL=$1

# MARKDOWN
echo "Creating $BOOKMARK_FILEPATH"
mkdir -p $BOOKMARK_DIR
touch $BOOKMARK_FILEPATH

TITLE=`curl -s $URL | pup 'title:first-of-type text{}'`
BASEDOMAIN=`echo $URL | awk -F[/:] '{print $4}'`

# CREATE FILE
cat << EOF > $BOOKMARK_FILEPATH
---
title: $TITLE
date: "$CURRENT_ISO8601"
bookmark_of: $URL
basedomain: $BASEDOMAIN
---

$DESCRIPTION

EOF

Original link and Wayback Machine link