(the later the better / 越新的越好)
Date: June, 2023
日期:2023年6月
This paper implements a stable diffusion model for text-to-image generation. It leverages a U-Net architecture enhanced with cross-attention layers to incorporate text embeddings from the CLIP ViT-B/32 model. Evaluated on the MNIST dataset using digit strings as input, the approach emphasizes time efficiency while highlighting a trade-off in image quality compared to a DCGAN-based method.
本文实现了一个用于文本到图像生成的稳定扩散模型。该模型利用一个经过交叉注意力层增强的 U-Net 架构,将 CLIP ViT-B/32 模型生成的文本嵌入融合进来。在 MNIST 数据集上以数字字符串作为输入进行评估,该方法强调了时间效率,同时展现了与基于 DCGAN 方法相比在图像质量方面的权衡。
Date: June, 2024
日期:2024年6月
This paper explores methods to stabilize the training of complex DQN models. Leveraging a U-ViT based architecture and incorporating a memory component, the approach addresses convergence issues by using techniques such as exponential moving average updates for the target network and enhanced state representation. Experiments on the CarRacing-V2 environment demonstrate improved performance and training stability.
本文探讨了稳定训练复杂DQN模型的方法。通过采用基于U-ViT的架构并整合记忆模块,该方法利用目标网络的指数移动平均更新和增强的状态表示解决了收敛性问题。在CarRacing-V2环境中的实验表明,模型的性能和训练稳定性得到了显著提升。
Date: June, 2024
日期:2024年6月
This paper presents a solution for a deep learning competition focused on video frame prediction and semantic segmentation. The model is composed of three parts: a JEPA encoder, a predictor, and a semantic decoder. The U-Net architecture is employed to enhance the semantic decoder, as its predictions on abundant unlabeled data are used as input for the JEPA encoder, thereby improving feature extraction and overall performance. The dataset provided includes many unlabeled videos along with some labeled samples, offering a rich ground for self-supervised and semi-supervised learning. If we were to tackle this challenge again, we would explore using V-JEPA to further enhance performance and efficiency.
本文提出了一种针对深度学习竞赛的视频帧预测和语义分割问题的解决方案。该模型由三个部分组成:JEPA 编码器、预测器和语义解码器。我们采用 U-Net 架构来增强语义解码器,因为其在大量未标注数据上的预测结果被用作 JEPA 编码器的输入,从而提高了特征提取和整体模型性能。竞赛数据集提供了大量未标注的视频以及一些标注样本,为自监督和半监督学习提供了丰富的资源。如果再次面对这一挑战,我们将探索采用 V-JEPA 以进一步提升性能和效率。
Date: December 2024
日期:2024年12月
This paper tackles fairness issues in face recognition by exploring innovative self-supervised learning techniques. A key contribution is the introduction of a dynamic masking method within a Masked Autoencoder framework—referred to as the sensitivity-awareness masking method. This approach dynamically adjusts the masking strategy by analyzing attention heatmaps to identify and prioritize sensitive regions in face images. By doing so, the model learns to either focus on or de-emphasize these regions, effectively mitigating biases related to age, gender, and race.
本文通过探索创新的自监督学习技术来解决人脸识别中的公平性问题。其一项关键贡献是引入了一种基于 Masked Autoencoder 框架的动态遮罩方法,即敏感性自适应遮罩方法。该方法通过分析注意力热图,动态调整遮罩策略,从而识别并优先处理人脸图像中的敏感区域。通过这种方式,模型学会聚焦或淡化这些区域,有效地缓解了与年龄、性别和种族相关的偏差。
Date: March, 2022
日期:2022年3月
This paper revisits the classical St. Petersburg Paradox through a detailed analysis using Monte Carlo simulations. It demonstrates that the theoretically infinite expected value is practically unattainable in single plays, and examines how finite sample sizes affect actual payoffs. The findings offer fresh insights into risk preferences and decision-making under uncertainty.
本文通过详细的蒙特卡罗模拟重新审视了经典的圣彼得堡悖论,证明了理论上无限的期望值在单次游戏中实际上无法实现,并探讨了有限样本量如何影响实际收益。研究结果为风险偏好和不确定性下的决策提供了新的见解。