Quality Alchemist (品質煉金術師): PolyU Computational Linguistics Summit in the Era of LLM (Day 2)

2024年8月23日星期五

PolyU Computational Linguistics Summit in the Era of LLM (Day 2)

The summit named “Computational Linguistics Summit in the Era of Large Language Models cum International symposium on Collaborative Innovations between The Hong Kong Polytechnic University and The China Computer Federation” organized by COMP Dept., PolyU on 22^nd & 23^rd Aug 2024. The first speaker of Day 2 was Dr. Xingshan Zeng (曾幸山) (Huawei Noah’s Ark Lab) and his topic entitled “Advancing LLM Evaluation: Comprehen- sive Evaluation on Long-Context, Multi-Turn, and Instruction-Following”

Firstly, Dr. Zeng briefed Large Language Models (LLM) benchmark evaluation and the diagram showed the process of evaluation in LLM development.

Then he introduced four long-context evaluation (LE) named “M4LE” including Multi-ability, Multi-range, Multi-task and Multi-domain. Overall concepts of multi-turn abilities covered recollection, expansion, refinement and follow-up.

Finally, Dr. Zeng summarized the existing LLM evaluation system and introduced M4LE, MT-Eval and FollowBench evaluation methods for enhancement.

The second speaker was Dr. Zhongqing Wang (王中卿) (Soochow University) and his presentation named “Metaphor and Synesthesia Analysis via Computational Linguistic Methods”. His talk included textual metaphor and synesthesia analysis.

A metaphor is a figure of speech that directly compares one thing to another, emphasizing the similarities between two different concepts without using the words "like" or "as".

And then he introduced different metaphor detection methods such as based on Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT), etc.

Lastly, Dr. Zeng mentioned the distribution of sensory words for synesthesia detection.

The third speaker was Prof. Haofen Wang (王昊奮) (Tongji University) and his topic was “Knowledge Retrieval Augmentation: Paradigm and Key Technologies”. Firstly, he briefed retrieval-augmentation generation (RAG) and its development history.

Modular RAG is an advanced approach that breaks down the RAG process into distinct, interchangeable modules, allowing for more flexibility and customization in how information is retrieved and generated. Prof. Wang discussed its opportunities.

The approach on knowledge guide that using a knowledge base to guide RAG retrieval and generation, enhancing traceability and reliability was discussed.

At the end, he summarized the RAG ecosystem, prospective, paradigm and evaluation.

The fourth speaker was Prof. Yue Zhang (張岳) (Westlake University) and his presentation title named “LLM-generated Text Detection. One of motivations was AI writings that are hard to detect.

His study aimed to distinct AIGC and Human writing.

After that he briefed some existing approaches including trained detector and zero-shot detector (likelihood/DetectGPT).

Finally, he summarized key features of Machine-generated Text Detection (MAGE).

The last speaker was Dr. Derek F. Wong (黃輝) (University of Macau) and his topic entitled “Prefix Text as a Yarn – Eliciting Non-English Alignment in Foundation Language Model”.

In the beginning, Dr. Wong briefed the world language machine translation market and identified its significance to Macau to Portuguese-speaking countries.

Then he briefed some challenges including morphologically rich language and syntactic differences. After that Dr. Wong briefed his research of Neural Machine Translation (NMT).