doge

Zirun Zhu's
Homepage

Education

2014–present

PhD student,
Department of Informatics,
School of Multidisciplinary Sciences,
SOKENDAI (The Graduate University for Advanced Studies) and
National Institute of Informatics

2010–2014

Bachelor of Engineering,
Computer Science and Technology,
University of Science and Technology of China


Research (Interests and Projects)

I am interested in functional programming languages and designing domain-specific languages (DSLs) for solving practical problems, by taking advantage of relevant theoretical research. I enjoy metaprogramming and source-to-source code generation, which are the daily work required by my doctoral research—bidirectional transformations (BX), which are an elegant approach to synchronising data in different formats. In particular, I am focusing on the synchronisation problem between the program text and its abstract syntax representation.

Bidirectional Parsing and Printing (BiYacc)

As an application of BX, I developed a DSL, BiYacc, which can generate a pair of parser and printer that are consistent with each other from a single specification, for synchronising program text and an abstract syntax tree (AST) that is possibly modified, for instance, by some refactoring algorithm, from the one directly parsed from the program text. Different from conventional methods, parser and printer pairs produced by BiYacc will try to preserve comments, layouts, and other syntactic objects during the synchronisation. BiYacc has been adopted as a part of the system (developed by Daisuke Kinoshita and Keisuke Nakano) to synchronise Coq functions and their corresponding Ocaml programs.

Grammar Disambiguation (Bi-filters)

Conventional approaches are to develop disambiguation treatments either for the parsing direction or the printing direction. As a result, we do not know if two arbitrarily picked treatments, one for the parsing direction and the other for the printing direction, will work in harmony, i.e. does not break the consistency between the parser and printer (plus disambiguation). In response to the problem, based on generalised parsing and (unidirectional) filters, I propose bidirectionalised filters (bi-filters) which work both in the parsing direction and in the printing direction; bi-filters satisfy bi-filter laws and will not affect the consistency of the whole parsing and printing process. I have designed bi-filter directives and integrated the functionality into BiYacc.

Retentive Printing (Retentive Lenses)

When synchronising program text and an AST that is possibly modified, for instance, by some refactoring algorithm, from the one directly parsed from the program text, the preservation of comments, layouts, and syntactic objects is not always guaranteed, even for tools like BiYacc. To remedy the situation, I propose retentive lenses (for BX in general), which satisfy Retentiveness and theoretically guarantee information retention during synchronisation on demand.

Document Synchronisation (BiPandoc)

Pandoc is a famous document converter. As a real-world application demonstrating the use of BX, together with Zhixuan Yang, I develop the prototype of BiPandoc, i.e. bidirectional Pandoc, that synchronises documents in HTML and Markdown formats.

Encryption for Fun

Encrypt your real and fake text into one.


Papers and Posters

[NGC (draft, under review)]
Bidirectional Parsing and Reflective Printing for Fully Disambiguated Grammars
Zirun Zhu, Hsiang-Shang Ko, Yongzhe Zhang, Pedro Martins, João Saraiva, and Zhenjiang Hu

Research paper (draft) Under the review of NGC

Language designers usually need to implement parsers and printers. Despite being two closely related programs, in practice they are often designed separately, and then need to be revised and kept consistent as the language evolves. It will be more convenient if the parser and printer can be unified and developed in a single program, with their consistency guaranteed automatically. Furthermore, in certain scenarios (like showing compiler optimisation results to the programmer), it is desirable to have a more powerful reflective printer that, when an abstract syntax tree corresponding to a piece of program text is modified, can propagate the modification to the program text while preserving layouts, comments, and syntactic sugar.

To address these needs, we propose a domain-specific language BIYACC, whose programs denote both a parser and a reflective printer for a fully disambiguated context- free grammar. BIYACC is based on the theory of bidirectional transformations, which helps to guarantee by construction that the generated pairs of parsers and reflective printers are consistent. Handling grammatical ambiguity is particularly challenging: we propose an approach based on generalised parsing and disambiguation filters, which produce all the parse results and (try to) select the only correct one in the parsing direction; the filters are carefully bidirectionalised so that they also work in the printing direction and do not break the consistency between the parsers and reflective printers. We show that BIYACC is capable of facilitating many tasks such as Pombrio and Krishnamurthi’s ‘resugaring’, simple refactoring, and language evolution.

[ICFP 2019 (draft, rejected)]
Retentive Lenses
Zirun Zhu, Zhixuan Yang, Hsiang-Shang Ko, and Zhenjiang Hu

Research paper (draft) Under the review of ICFP 2019

Based on Foster et al.'s [2007] lenses, various bidirectional programming languages and systems have been developed for helping the user to write correct data synchronisers. The two well-behavedness laws of lenses, namely Correctness and Hippocraticness, are usually adopted as the guarantee of these systems. While lenses are designed to retain information in the source when the view is modified, well-behavedness says very little about the retaining of information, since Hippocraticness only requires that the source be unchanged if the view is not modified. Thus nothing about retaining is guaranteed when the view is changed.

To address this problem, we propose an extension of the original lenses, called retentive lenses, satisfying a new Retentiveness law which can guarantee that if parts of the view are unchanged, then the corresponding parts of the source are retained as well. As a concrete example of retentive lenses, we present a domain-specific language for writing tree transformations. We prove that the pair of get and put functions generated from a program in our DSL forms a retentive lens. We study the practical use of retentive lenses by implementing in our DSL a synchroniser between concrete and abstract syntax trees for a small subset of Java, and demonstrate that Retentiveness helps to enforce the retaining of comments and syntactic sugar in code refactoring.

[ICFP 2018 (draft, rejected)]
Retentive Lenses
Zirun Zhu, Hsiang-Shang Ko, Zhixuan Yang, and Zhenjiang Hu

Research paper (draft) Rejected by ICFP 2018

Researchers in the field of bidirectional transformations have studied data synchronisation for a long time and proposed various properties, of which well-behavedness is the most fundamental. However, well-behavedness is not enough to characterise the result of an update (performed by put), regarding what information should be retained in the updated source. The root cause is that the property, Hippocraticness, for guaranteeing the retention of source information is too ‘global’, only requiring that the whole source should be unchanged if the whole view is.

In this paper we propose a new property retentiveness, which enables us to directly reason about the retention of source information locally. Central to our formulation of retentiveness is the notion of links, which are used to relate fragments of sources and views. These links are passed as additional input to the extended put function, which produces a new source in a way that preserves all the source fragments attached to the links. We validate the feasibility of retentiveness by designing a domain-specific language (DSL) supporting mutually recursive algebraic data types. We prove that any program written in our DSL gives rise to a pair of retentive get and put. We show the usefulness of retentiveness by presenting examples in two different research areas: resugaring and code refactoring.

[SLE 2016]
Parsing and Reflective Printing, Bidirectionally
Zirun Zhu, Yongzhe Zhang, Hsiang-Shang Ko, Pedro Martins, João Saraiva, and Zhenjiang Hu

Research paper SLE 2016

Language designers usually need to implement parsers and printers. Despite being two intimately related programs, in practice they are often designed separately, and then need to be revised and kept consistent as the language evolves. It will be more convenient if the parser and printer can be unified and developed in one single program, with their consistency guaranteed automatically.

Furthermore, in certain scenarios (like showing compiler optimisation results to the programmer), it is desirable to have a more powerful reflective printer that, when an abstract syntax tree corresponding to a piece of program text is modified, can reflect the modification to the program text while preserving layouts, comments, and syntactic sugar.

To address these needs, we propose a domain-specific language BiYacc, whose programs denote both a parser and a reflective printer for an unambiguous context-free grammar. BiYacc is based on the theory of bidirectional transformations, which helps to guarantee by construction that the pairs of parsers and reflective printers generated by BiYacc are consistent. We show that BiYacc is capable of facilitating many tasks such as Pombrio and Krishnamurthi’s “resugaring”, simple refactoring, and language evolution.

[FLOPS 2016 (poster presentation)]
BiYacc: Roll Your Parser and Reflective Printer into One.
Zirun Zhu, Yongzhe Zhang

Poster presentation FLOPS 2016

Language designers usually need to implement parsers and printers. Despite being two intimately related programs, in practice they are often designed separately. This leads to a maintenance problem: as a language evolves, both its parser and printer need to be revised and kept consistent, which is a routine but error-prone task.

To address the problem, we propose a new domain-specific language BiYacc, whose programs denote both a parser and a “reflective” printer for synchronising the concrete and abstract representations. BiYacc is based on the theory of bidirectional transformations, which guarantees that the pairs of parsers and printers generated by BiYacc are always consistent in a precise sense. The “reflective” printer generated from a BiYacc program is distinct from traditional printers: it takes not only an abstract syntax tree (AST) but also a concrete syntax tree (CST), and produces an updated CST into which information from the AST is properly embedded. Internally, BiYacc is implemented in terms of BiGUL, a core bidirectional programming language which has been formally proved correct and implemented as a Haskell library. The consistency of parsers and printers is thus derived from the correctness of BiGUL for free.

[BX 2015]
"BiYacc: Roll Your Parser and Reflective Printer into One."
Zirun Zhu, Hsiang-Shang Ko, Pedro Martins, João Saraiva, and Zhenjiang Hu

Tool paper BX 2015

Language designers usually need to implement parsers and printers. Despite being two related programs, in practice they are designed and implemented separately. This approach has an obvious disadvantage: as a language evolves, both its parser and printer need to be separately revised and kept synchronised. Such tasks are routine but complicated and error-prone. To facilitate these tasks, we propose a language called BiYacc, whose programs denote both a parser and a printer. In essence, BiYacc is a domain-specific language for writing putback-based bidirectional transformations — the printer is a putback transformation, and the parser is the corresponding get transformation. The pairs of parsers and printers generated by BiYacc are thus always guaranteed to satisfy the usual round-trip properties. The highlight that distinguishes this reflective printer from others is that the printer — being a putback transformation — accepts not only an abstract syntax tree but also a string, and produces an updated string consistent with the given abstract syntax tree. We can thus make use of the additional input string, with mechanisms such as simultaneous pattern matching on the view and the source, to provide users with full control over the printing-strategies.


Contact

Programming Research Lab, Room 1611,
National Institute of Informatics,
2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan

Email:


PhD Advisor

Zhenjiang Hu

Professor at Peking University
Professor by Special Appointment at National Institute of Informatics
Visiting Professor at SOKENDAI (The Graduate University for Advanced Studies)