THU-IIIF实习

2024-03-19

实习

字数统计: 853 | 阅读时长≈ 4 分钟

THU-IIIF实习

正好这段时间事情稍微少一点，于是跟iie群主敬哥做个比较水的实习，目前主要是transformers学术向论文相关的，就当是拓展一下视野了(同时让这段时间的简历不那么空白)。

第一周（2024.3.18—2024.3.24）

主要任务：
1.阅读transformer相关论文，先对概念、用途有大致了解。
2.大语言模型调研、文本大语言模型调研

https://transformers.run Transformers快速入门

1.transformer相关论文

【list】

【文章名称】,【第一作者名称, 单位】,【期刊或会议名称】,【发表时间】,【doi链接】

1.Birth of a Transformer: A Memory Viewpoint ，Alberto Bietti， Flatiron Institute，NeurIPS， Sept 2023，https://doi.org/10.48550/arXiv.2306.00802 (doi只找到了arxiv版本)

2.Multimodal Learning With Transformers: A Survey， Peng Xu，Thu，IEEE，OCT 2023，https://doi.org/10.1109/TPAMI.2022.3152247

3.A Survey on Vision Transformer， Kai Han， The University of Hong Kong， IEEE， Feb 2022 https://doi.org/10.1109/TPAMI.2022.3152247

【notes】

【文章名称、对应上述list序号】

【做了什么、怎么做的、什么效果的两三句话】

（optional）【你自己的心得见解Anything】

1.Birth of a Transformer: A Memory Viewpoint

The researchers investigated how transformers balance global and context-specific knowledge, highlighting the role of weight matrices as associative memories. Their study provided insights into improving the reliability and interpretability of transformers by understanding their internal mechanisms.

【怎么做的】

The researchers conducted their study by introducing a synthetic dataset that showcased how transformers learn to balance global and context-specific knowledge. They utilized a simplified two-layer transformer architecture and carefully analyzed the training process to observe the learning dynamics of global and context-specific information. Additionally, they viewed weight matrices as associative memories, providing theoretical insights on how gradients facilitate their learning during training.

【形成效果】

By considering weight matrices as associative memories, the researchers provided insights into the learning dynamics of transformers during training. This approach shed light on the mechanisms behind the rapid learning of global information and the slower development of an “induction head” mechanism for context-specific knowledge.

2.Multimodal Learning With Transformers: A Survey

【做了什么】

The paper conducted a survey on the application of Transformers in multimodal learning. It explored the use of Transformers as a promising neural network learner and summarized their success in various machine learning tasks. The paper also discussed the key challenges and solutions in this emerging field, along with open issues and potential research directions.

【怎么做的】

The paper extensively investigated the design, training, and applications of Transformers in multimodal learning. It provided in-depth analysis of how Transformers excel in handling multimodal data and learning relationships between different modalities. The survey encompassed different types of Transformer models, application domains, challenges, and future research directions.

【形成效果】

The paper achieved a comprehensive overview of the application of Transformers in multimodal learning, offering readers a thorough understanding of the topic. By delving into the design, training, and applications of Transformers in multimodal learning, the paper showcased the advantages and potential of Transformers in processing multimodal data and learning inter-modality relationships. Additionally, by presenting open issues and potential research directions, the paper inspired researchers and practitioners to explore further in this field, encouraging advancements and innovations.

3.A Survey on Vision Transformer

【做了什么】

The paper provides a comprehensive overview of recent advances in vision transformers, categorizing them based on application scenarios such as backbone network, high/mid-level vision, low-level vision, and video processing.

【怎么做的】

It delves into the formulation of standard transformer models and the self-attention mechanism, emphasizing the pivotal role of self-attention in transformer architectures for vision tasks.

【形成效果】

By comparing transformer-based models to convolutional and recurrent neural networks in visual benchmarks, the paper demonstrates the high performance and versatility of transformer models in computer vision tasks, attracting increasing attention from the computer vision community.

【做了什么】

【怎么做的】

【形成效果】

2.大语言模型调研、文本大语言模型调研

关注技术点：PDF OCR解析
可搜索：文本大语言模型【这个关键词可扩展】 filetype:pdf（包括但不限于微信公众号、抖音、B站、youtube、ig、推等）

https://zhuanlan.zhihu.com/p/614766286 大语言模型调研汇总(偏概念介绍)

版权声明： 本博客所有文章除特别声明外，著作权归作者所有。转载请注明出处！