Recently, I needed to extract a set of port numbers from a .json file in order to connect to a remotely running IPython kernel instance locally, and learned that it is possible to use zero-width assertions with grep. Zero-width assertions are regular expressions that match a specific pattern without consuming, so they can be used to anchor your regular expression within the target text.
The IPython kernel file structure is consistent across invocations, but each new file contains different random port numbers which need to be forwarded between local and remotehost. These ports facilitate the connection to the instance of the IPython kernel running on remotehost. What follows is the content of a typical kernel file with randomly-generated ports (the filename is also random, but follows the format kernel-####.json):
{
"iopub_port": 39736,
"control_port": 59725,
"transport": "tcp",
"shell_port": 51963,
"key": "1fcd997c-ef64-4322-8762-c034af6095e1",
"stdin_port": 59714,
"signature_scheme": "hmac-sha256",
"hb_port": 41128,
"ip": "127.0.0.1"
}
Our goal is to extract and forward the randomly generated port numbers using
ssh. One approach is provided in the IPython Cookbook recipe
Connecting to a remote kernel via ssh,
which extracts the ports and forwards them iteratively:
#!/bin/bash
# Assume kernel connection details reside in `kernel-2323.json`
for port in $(cat kernel-2323.json | grep '_port' | grep -o '[0-9]\+'); do
ssh remotehost.com -f -N -L $port:127.0.0.1:$port
done
In the call to ssh, the options -f -N -L
specify:
-f
Requests ssh to go to background just before command execution.
-N
Do not execute a remote command. This is useful for just forwarding ports.
-L
Specifies that connections to the given port on localhost are to be
forwarded to the port on remotehost.
The cookbook method works without issue. However, the pattern extracts port
numbers without consideration of the port’s associated kernel component. For
example, if we needed to know which port corresponds to shell_port
, the
cookbook solution falls short.
An alternative approach uses grep’s zero-width assertion operator \K
. This
option isn’t listed in grep’s help menu or man page, but is nonetheless valid
syntactically (the -P
flag indicates that the pattern is a Perl regular
expression). Simply provide grep with any valid regular expression pattern:
If \K
is included within the regular expression, the matching text that
follows will be returned if and only if what precedes it also matches. This is
also known as a positive lookbehind assertion.
The next example parses kernel-2323.json as before, but this time retains the component-to-port mapping. After extracting the kernel component names and ports, we save them to an associative array:
#!/bin/bash
KERNEL_FILENAME="kernel-2323.json"
declare -a portsArr=('hb_port' 'iopub_port' 'control_port' 'shell_port' 'stdin_port');
declare -A kernelDict # Associative array to hold component-port mapping
for portname in "${portsArr[@]}"
do
PATTERN="[[:space:]]+\"${portname}\":[[:space:]]+\K[0-9]{2,5}"
PORTNBR=$(grep -Po ${PATTERN} "${KERNEL_FILENAME}")
echo "Now forwarding ${portname}..."
ssh remotehost.com -f -N -L ${PORTNBR}:127.0.0.1:${PORTNBR}
# Add component-port mapping to kernelDict.
kernelDict["${portname}"]="${PORTNBR}"
done
Notice the placement of \K
: At each iteration, the pattern specifies that a
matching string will contain the port name followed by a colon and one or more
whitespace characters, followed by 2-5 digits. Since \K
directly precedes
“[0-9]{2,5}” successful matches will only return that portion of a matching string.
Our implementation works as expected, but is inefficient: For each port number
extracted and forwarded, the kernel file is reopened and reread. For this
example it’s not much of a problem, but for larger files, this approach could
result in serious performance degradation. A more efficient solution would
read the kernel file in one time, storing it in a variable, and searching this
variable against the regular expression pattern at each iteration. The change
in logic is subtle: the only difference is reading the file into the variable
identified as KERNEL_CONTENTS
at the start of the script, and the
inclusion of <<<
after the grep command:
#!/bin/bash
KERNEL_FILENAME="kernel-2323.json"
KERNEL_CONTENTS="$(cat ${KERNEL_FILENAME})"
declare -a portsArr=('hb_port' 'iopub_port' 'control_port' 'shell_port' 'stdin_port');
declare -A kernelDict # Associative array to hold component-port mapping
for portname in "${portsArr[@]}"
do
PATTERN="[[:space:]]+\"${portname}\":[[:space:]]+\K[0-9]{2,5}"
PORTNBR=$(grep -Po ${PATTERN} <<< "${KERNEL_CONTENTS}")
echo "Now forwarding ${portname}..."
ssh remotehost.com -f -N -L ${PORTNBR}:127.0.0.1:${PORTNBR}
# Add component-port mapping to kernelDict.
kernelDict["${portname}"]="${PORTNBR}"
done
The <<<
syntax is used to indicate a here string, a form of input
redirection which allows variables containing text to be interpreted as a
file-like object. See this link
for more information.
That’s it for now. Until next time, happy coding!