Hangul jamo diff,dwdiff

I experimented Korean text diff. I used a python3 package below.


A Guide to using Python-Jamo — jamo 0.4-beta documentation

Firstly, let’s extract jamo from a random word.

$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from jamo import (hcj2j,j2hcj,h2j)
>>> j2hcj(h2j("What is 한글?"))
'What is ㅎㅏㄴㄱㅡㄹ?'
>>> print(hcj2j("ㅎ","lead")+hcj2j("ㅏ")+hcj2j("ㄴ","tail")+hcj2j("ㄱ","lead")+hcj2j("ㅡ","vowel")+hcj2j("ㄹ","tail"))

Let’s compare the jamo , which is the final line in REPL output above, with the original string, which is “한글”.

You don’t see the difference in terminal log in above and bottom because browser seem to render the Korean character from jamos. But these appear different in terminals.


$ diff <(echo "한글") <(echo "한글")
< 한글
> 한글
$ dwdiff <(echo "한글") <(echo "한글")

So, ‘diff’ command saw these 2 strings as differnt, but ‘dwdiff’ recognized these 2 as equivalent/identical.

python 2to3 one-liner

Cloudera uses python in some components so python2 desupport is one of  frequent topics at work.

Red Hat has committed to supporting it for RHEL 8 customers until 2024.

Python 2 bows out after epic transition. And there was much applause because you’ve all moved to version 3, right? Uh, right? • The Register

Today, I learned SimpleHTTPServer in python2 was merged to http.server in python3.

class Handler(SimpleHTTPServer.SimpleHTTPRequestHandler):

python SimpleHTTPServer with custom MIME types

I learned there’s a tool named ‘2to3’ to convert python2 script to python3 script. This tool outputs GNU patch(/usr/bin/patch) text to stdout.

Below is the result of applying patch to python2 script in the quoted URL above and feeding to python3 without saving anything to file system. This ran just fine on WSL(Windows Subsystems for Linux, Ubuntu 18.04) on Windows 10!

$ 2to3 http_server.py | tee >(patch -o - | python3)
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Refactored http_server.py
RefactoringTool: Files that need to be modified:
RefactoringTool: http_server.py
--- http_server.py      (original)
+++ http_server.py      (refactored)
@@ -1,15 +1,15 @@
-import SimpleHTTPServer
-import SocketServer
+import http.server
+import socketserver

 PORT = 8000

-class Handler(SimpleHTTPServer.SimpleHTTPRequestHandler):
+class Handler(http.server.SimpleHTTPRequestHandler):

-httpd = SocketServer.TCPServer(("", PORT), Handler)
+httpd = socketserver.TCPServer(("", PORT), Handler)

-print "serving at port", PORT
+print("serving at port", PORT)
patching file - (read from http_server.py)
serving at port 8000 - - [27/Apr/2020 03:05:20] "GET / HTTP/1.1" 200 - - - [27/Apr/2020 03:05:24] "GET /http_server.py HTTP/1.1" 200 -