nodejs xml2js one-liner

このブログは Open Live Writer で書いてます。このエディターは自動でHREFリンクを埋め込む機能があります。

image

上の画面ショットの設定はXMLファイルに保存されます。こんな感じ。

$ head -5 /mnt/c/Users/$LOGNAME/AppData/Local/Packages/\
*.OpenLiveWriter_*/LocalCache/Roaming/OpenLiveWriter/LinkGlossary/linkglossary.xml
<?xml version="1.0" encoding="utf-8"?>
<glossary>
  <entry>
    <text>Open Live Writer site</text>
    <url>http://www.OpenLiveWriter.org</url>

Node の xml2js というXMLをJSONに変換してくれるmoduleで変換してみます。

var parser = new xml2js.Parser();

xml2js – npm

上のページのリンクした箇所を無理やり(笑) one-liner にしてみました。そして jq に渡します。

$ node -e \
  "var fs=require('fs'),xml2js=require('xml2js');
   fs.readFile(0,function(err,data){
     xml2js.Parser().parseString(data,function(err,result){
       console.log(JSON.stringify(result));});});" \
< /mnt/c/Users/$LOGNAME/AppData/Local/Packages/*.OpenLiveWriter_*/LocalCache/Roaming/OpenLiveWriter/LinkGlossary/linkglossary.xml |
jq -r '.glossary.entry[]|"\(.text[0])\t\(.url[0])"'
Open Live Writer site   http://www.OpenLiveWriter.org
Cloudera        https://www.cloudera.com/
Hue     http://gethue.com/
Apache Hadoop Yarn      http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
#gcpug  https://twitter.com/hashtag/gcpug?f=tweets&vertical=default&src=hash
#openshiftjp    https://twitter.com/hashtag/openshiftjp?f=tweets&vertical=default&src=hash
#k8sjp  https://twitter.com/hashtag/k8sjp?f=tweets&vertical=default&src=hash
弊社    https://jp.cloudera.com/
仕事    https://jp.cloudera.com/
会社    https://jp.cloudera.com/
本業    http://jp.cloudera.com/
社内    https://jp.cloudera.com/
#secjaws        https://twitter.com/hashtag/secjaws?f=live

Promise 使えばもっと短くなると思うので成功したらまた書きたいと思います。

Hangul jamo diff,dwdiff

I experimented Korean text diff. I used a python3 package below.

j2hcj(h2j(“자모=字母=jamo”))

A Guide to using Python-Jamo — jamo 0.4-beta documentation

Firstly, let’s extract jamo from a random word.

$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from jamo import (hcj2j,j2hcj,h2j)
>>> j2hcj(h2j("What is 한글?"))
'What is ㅎㅏㄴㄱㅡㄹ?'
>>> print(hcj2j("ㅎ","lead")+hcj2j("ㅏ")+hcj2j("ㄴ","tail")+hcj2j("ㄱ","lead")+hcj2j("ㅡ","vowel")+hcj2j("ㄹ","tail"))
한글

Let’s compare the jamo , which is the final line in REPL output above, with the original string, which is “한글”.

You don’t see the difference in terminal log in above and bottom because browser seem to render the Korean character from jamos. But these appear different in terminals.

image

$ diff <(echo "한글") <(echo "한글")
1c1
< 한글
---
> 한글
$ dwdiff <(echo "한글") <(echo "한글")
한글

So, ‘diff’ command saw these 2 strings as differnt, but ‘dwdiff’ recognized these 2 as equivalent/identical.