VizWiz Research Project
VQA app: Developed a demo to capture photo and spoken question and applied speech to text (DeepSpeech) and image quality detection algorithms. (3/2019-6/2019)
Question Answerability: Extracted features for visual question’s answerability using OpenCV and Azure API; extracted text features using NLTK to predict answerability of a visual question.
Master Thesis: Answered visual question with external Knowledge (Knowledge Base, Reverse Image Search, and Image Search by Text). The results show that including external knowledge can largely improve the accuracy of VQA and show a possibility of answering questions that are labeled as unanswerable by crowd workers. (6/2019-6/2020)
VQA tutorials: Wrote a GitHub page to summarize recent advances in VQA. Wrote articles in Zhihu (Chinese version of Quora) for VQA tutorials.
VQA algorithms: applied state-of-the-art algorithms for VQA, e.g., MCAN + grid features and Pythia 3. Studying fusion methods and attention mechanisms for VQA algorithms. (6/2020-present)
VQA Crowdsourcing: Building VizWiz-Visual Grounding dataset with Amazon Mechanical Turk. (9/2020-present)
Our Image & Video Computing Group
Course Project for Natural Language Generation:
Chest disease classification and visual grounding: Applied hard attention model and stand-alone self-attention model to extract chest X-ray radiology images. Used multi-task learning and contrastive learning to teach model to learn from radiomics features, predict pneumonia disease, and ground the disease areas.
Radiology report generation: Reproduced the paper when radiology report generation meets knowledge graph: used Densenet-121 to extract the image features and build knowledge graph via GCN and generate report via multi-level LSTM.
Course Proj for AI in health
Built COVID-19 Knowledge GraphUsed BioBERT and PubTator for biomedical name entity recognition for PubMed dataset and COVID-19 44K dataset. Built coronavirus related knowledge graphs using Gephi.
Integrated the coronavirus knowledge graph with the KG from Data2Discovery company.
Built Tutorials: Built BioBERT tutorial and SQL tutorial for PubMed and MIMIC III dataset. Built Knowledge Graph mining algorithms tutorials.
Paper1 | Paper 2
Course Proj for Advanced Programming Tools: ByteMe
Developed ByteMe application for both Web (frontend: HTML + JS + Ajax; backend: Python + Flask) and Mobile platforms (React Native and Kotlin).
Built, deployed and managed application using Google App Engine, wrote python Database API to handle MongoDB, developed navigation function, camera function, and user login function with Google Firebase.
Implemented “NewByte” page with the AutoFill function using Food 101 classification model based on Google Inception V3 model and Azure API.
WebsiteApp made by Kotlin App made by React Native
Research Proj: Evaluation of Mental Stress and Heart Rate Variability Derived from Wrist-Based Photoplethysmography
Designed stress induction experiment. Collected, filtered ECG and wrist-based PPG signals and detected signal quality. Designed Peak Finding Algorithms for PPG and ECG.
Calculated Heart Rate Variability to classify stress states. The overall Leave-One-Participant-Out accuracy of wristed-based PPG with 3 mins temporal window reaches 80%.
Paper | Poster | Award
Course Proj for HCI: Understanding Health-related Information Searching Behavior Through Eye Tracking
Collected eye-tracking data (AOIs, TTFF, etc.) using Tobii TX300 eye-tracker and iMotions.
Analyzed data using Kruskal-Wallis test, One-Way Anova and Mann-Whitney U Test. (Paper)
Course Proj for Activities Recognition: Activities Recognition in Self-Driving Car
Collected ten peoples’ five activities to solve the take-over problem. Reduced individual differences. Built pose estimator to detect skeleton of people. Extracted secondary features to help classify similar activities. Ensemble them with LSTM. (Paper)
Course Proj for Visual Environment
Summary: We used Unity 3D to build a virual presentation demo.
Why: Our design can help people with presentation anxiety and improve presentation skills. It also provide a solution for distance meeting.
Details: We disigned different human-human interaction/attitudes for virtual audience. For positive attitude, some virtual audience would imitate the actions of the speakers when the speaker is doing experiment, some audience would always pay attention to the speaker by turning their body towards the speaker.
For passive attitude, the audience just ignore the speaker.
Besides, we designed different human-objects interactions: interacting with slides, poping out details of the display item when user gets close to the item, etc.
Research Proj: 2017 Mathematical Contest in Modeling - "Cooperate and navigate"
Summary: We analyzed of the effects of allowing self-driving, cooperating cars on the roads in several countries in the U.S. as well as suggesting the best percentage of self-driving car, and policy changes like setting exclusive lane.
Why: Self-driving, cooperating cars have been proposed as a solution to increase capacity of highways
without increasing number of lanes or roads. The behavior of these cars interacting with the existing
traffic flow and each other is not well understood at this point
Details: We built Phantom Traffic Jam Model to simulate traffic jam on highway with few intersections and accidents. Created Smart Driver Model with versions for human drivers and smart cars.
We predicted traffic condition with varied road density and smart car proportions.
We built Global Decision Model to control smart car proportions and provide optimal route plans for both human drivers and smart cars.