Curiosity-Driven Reinforcement Learning from Human Feedback CD-RLHF: An AI Framework that Mitigates the Diversity...
Large Language Models (LLMs) have become increasingly reliant on Reinforcement Learning from...
Memorization vs. Generalization: How Supervised Fine-Tuning SFT and Reinforcement Learning RL Shape Foundation Model...
Modern AI systems rely heavily on post-training techniques like supervised fine-tuning (SFT)...
Meta AI Proposes EvalPlanner: A Preference Optimization Algorithm for Thinking-LLM-as-a-Judge
The rapid advancement of Large Language Models (LLMs) has significantly improved their...
Agentic AI: The Foundations Based on Perception Layer, Knowledge Representation and Memory Systems
Agentic AI stands at the intersection of autonomy, intelligence, and adaptability, offering...
From Deep Knowledge Tracing to DKT2: A Leap Forward in Educational AI
Knowledge Tracing (KT) plays a crucial role in Intelligent Tutoring Systems (ITS) by modeling students’ knowledge states and predicting their future performance. Traditional KT...
Baidu Research Introduces EICopilot: An Intelligent Agent-based Chatbot to Retrieve and Interpret Enterprise Information...
Knowledge graphs have been used tremendously in the field of enterprise lately,...
Open Thoughts: An Open Source Initiative Advancing AI Reasoning with High-Quality Datasets and Models...
The critical issue of restricted access to high-quality reasoning datasets has limited...
Decoupling Tokenization: How Over-Tokenized Transformers Redefine Vocabulary Scaling in Language Models
Tokenization plays a fundamental role in the performance and scalability of Large...
Quantization Space Utilization Rate (QSUR): A Novel Post-Training Quantization Method Designed to Enhance the...
Post-training quantization (PTQ) focuses on reducing the size and improving the speed...
YuE: An Open-Source Music Generation AI Model Family Capable of Creating Full-Length Songs with...
Significant progress has been made in short-form instrumental compositions in AI and...























