Shang-Yi Chuang

Shang-Yi Chuang

ML Researcher | ASR R&D

Apple | Siri

Profile

Shang-Yi Chuang is a Machine Learning Researcher about Automatic Speech Recognition at the Siri team of Apple. She received her master’s degree in Computer Science from Cornell Tech. She has experience in international collaborations in the United States, Japan, and Taiwan.

Prior to Cornell Tech, she was working on artificial intelligence at Academia Sinica, Taiwan. Her research interests include natural language processing, speech processing, and computer vision. She collaborated with Prof. Yu Tsao and Prof. Hsin-Min Wang on audio-visual multimodal learning and data compression on speech enhancement. She was engaged in a project of a cross-lingual question answering system with Prof. Keh-Yih Su as well.

She was also conducting research about humanoid robots with Prof. Tomomichi Sugihara at Motor Intelligence, Osaka University. Her work was to improve the control system of a robot arm in order to create a safer and more comfortable human-robot coworking space.

Interests
  • Speech Processing
  • Natural Language Processing
  • Computer Vision
  • Multimodal Learning
Education
  • M.Eng., Major in Computer Science, 2021 - 2022

    Cornell Tech

  • B.S., Major in Mechanical Engineering, Minor in Electrical Engineering, 2012 - 2017

    National Taiwan University

  • FrontierLab@OsakaU Program, 2016 - 2017

    Osaka University

Experience

 
 
 
 
 
Research Assistant
2019 – 2021 Taipei, Taiwan
  1. Audio-Visual Multimodal Learning for On-device Systems
    1. Improved the system robustness against insufficient hardware or inferior sensors in a car-driving scenario by a data augmentation scheme
    2. Confirmed the effectiveness of lip images (both compressed and non-compressed) in speech enhancement tasks
    3. Minimized additional multimodal processing costs by applying an autoencoder and data quantization techniques while addressing privacy problems of facial data
    4. Significantly reduced the size of data to 0.33% without sacrificing the speech enhancement performance
  2. Cross-Lingual Movie QA (Question Answering) System
    1. Reduced unfavorable inequalities in technology caused by limited data in minority languages
    2. Applied transfer learning to a Mandarin system by incorporating translated corpus in dominant languages
    3. Achieved zero-shot learning on Mandarin movie QA tests by using pre-trained multilingual models
  3. EMA (Electromagnetic Midsagittal Articulography) Systems
    1. Designed silent speech for patients with vocal cord disorders or high-noise environments by joint training mel-spectrogram and deep feature loss
    2. Verified the effectiveness of the articulatory movement features of EMA in speech-related tasks
    3. Improved the character correct rate of automatic speech recognition by 30% in speech enhancement tasks
    4. Incorporated EMA into end-to-end speech synthesis systems and achieved 83% preference in subjective listening tests
  4. Self-Supervised Learning on Speech Enhancement
    1. Realized speech enhancement by applying a denoising autoencoder with a linear regression decoder
    2. Enhanced 43% of speech quality without limited intrusive paired data
    3. Potentially empowered the realization of unsupervised dereverberation
  5. Construction of Multimodal Datasets
    1. Highly addressed multimodal common problems of asynchronous devices
    2. Supervised crucial environment setups for collaborative labs, schools, and hospitals
    3. Published Taiwan Mandarin Speech with Video, an open source dataset including speech, video, and text
  6. Ported numerous existing systems from Keras, TensorFlow, and MATLAB into Pytorch and reduced processing costs by optimizing codes and cleansing data to Pytorch-friendly formats
  7. Published 5 papers including 1 top-notch IEEE/ACM journal and 4 conferences
  8. Took the initiative to be server manager, paper writing mentor, journal reviewer, and internship supervisor
  9. Unified spectral and prosodic feature enhancement
 
 
 
 
 
Special Auditor
2016 – 2017 Osaka, Japan
  1. Motion of Humanlike Robot Arms
    1. Sought a safer and more comfortable human-robot environment
    2. Imitated human behaviors by applying biological statistics results
    3. Smoothed the velocity profiles and trajectories of robot arms even when external forces exist
    4. Improved a manipulator’s ability to resume motion after external forces are removed
    5. Force detectors are unnecessary in the proposed control system
  2. Programming Tools of Control Theory
    1. Built a library of mechanics principles of kinematics and dynamics

Projects

DTS Stock Tracking

DTS is an API-based project that assists stock trading by providing registered users with stock prices and statistical analytics.

Going Everywhere

Going Everywhere

Going Everywhere is an NFT-based art project supported by the \Art Microgrant Award. A critical characteristic of blockchain is inerasability, and it can be leveraged to prove that something exists. The artist collected a set of photos of an identifier (Minccino) as a demonstration of where she existed across the physical world and launched the collection as an NFT. After putting the project on chain, the existence became eternal in the digital world.

Improved Lite Audio-Visual Speech Enhancement (iLAVSE)

Improved Lite Audio-Visual Speech Enhancement (iLAVSE)

iLAVSE is a deep-learning-based audio-visual project that addresses three practical issues often encountered in implementing AVSE systems, including the requirement for additional visual data, audio-visual asynchronization, and low-quality visual data.

Taiwan Mandarin Speech with Video (TMSV)

Taiwan Mandarin Speech with Video (TMSV)

TMSV is an audio-visual dataset based on the script of TMHINT (Taiwan Mandarin hearing in noise test).

Lite Audio-Visual Speech Enhancement (LAVSE)

Lite Audio-Visual Speech Enhancement (LAVSE)

LAVSE is a deep-learning-based audio-visual project that addresses additional processing costs and privacy problems.

Smooth and Flexible Movement Control of a Robot Arm

Smooth and Flexible Movement Control of a Robot Arm

Realizing human behavior on a robot arm based on a nonlinear reference shaping controller in order to create robots which can share a safe working environment with humans.

Flyback Converter

Flyback Converter

Practice of PCB designing and welding.

ME Robot Cup

ME Robot Cup

The ME Robot Cup is a traditional annual event at the Department of Mechanical Engineering, NTU.

Awards & Honors

\Art Microgrant Award
\Art awards Microgrants on a rolling basis to assist current Cornell University students pursuing art works, art tools, or other art activities that use emerging digital technologies. The student is named a \Art Microgrant Awardee for the given school year.
Merit-Based Scholarship
Cornell Tech offers scholarship aid to a limited number of master’s degree students each year. Merit-based scholarships are offered to a select group of admitted candidates who Cornell Tech strongly believes will significantly contribute to the Computer Science community.
Travel Grant
ISCA and Interspeech 2020 are jointly offering grants to students and young scientists, aged 34 and under in order to support their participation in the Interspeech 2020 Conference. The purpose of the grants is to support diversity of both scientific topics at the Conference as well as a wide representation of the international speech science and technology community.
JASSO Scholarship
JASSO offers scholarship for qualified international students who are accepted by a Japanese university, graduate school, junior college, college of technology(3rd grade or upper)or professional training college under a student exchange agreement or other student exchange arrangement on a short-term basis from 8 days to one year between Japanese school and their home higher educational institution outside Japan.
Second Prize in Mechanical Engineering Robot Cup

Skills

Mandarin

Native

English

Advanced

Japanese

Advanced

Python

C

MATLAB

PyTorch

Keras

TensorFlow

Piano

Guitar

Darkroom