Researchers from Fudan University and Shanghai AI Lab Introduces DOLPHIN: A Closed-Loop Framework for Automating Scientific Research with Iterative Feedback

0
7
Researchers from Fudan University and Shanghai AI Lab Introduces DOLPHIN: A Closed-Loop Framework for Automating Scientific Research with Iterative Feedback

Artificial Intelligence (AI) is revolutionizing how discoveries are made. AI is creating a new scientific paradigm with the acceleration of processes like data analysis, computation, and idea generation. Researchers want to create a system that eventually learns to bypass humans completely by completing the research cycle without human involvement. Such developments could raise productivity and bring people closer to tough challenges.

The process of hypothesis generation, execution of experiments, and data validation often proves inefficient as scientific research involves human elements. Innovative solutions are hindered from evolutionary progress since ideas cannot be perfected with iterative feedback mechanisms during experimentation. The importance of such an aspect cannot be overstated as it contributes towards quicker and more accurate findings in scientific studies.

Several research environments have been developed to automate the research process partially. Tools such as GPT-researcher and AI-Scientist can break tasks into simpler subtasks, help generate ideas, and perform some form of computation. An overall integrated framework, however, does not exist, including experimental feedback within the research cycle. Moreover, most tools today rely on small datasets or pre-defined workflows, limiting their ability to execute open-ended research tasks.

Fudan University and the Shanghai Artificial Intelligence Laboratory have developed DOLPHIN, a closed-loop auto-research framework covering the entire scientific research process. The system generates ideas, executes experiments, and incorporates feedback to refine subsequent iterations. DOLPHIN ensures higher efficiency and accuracy by ranking task-specific literature and employing advanced debugging processes. This comprehensive approach distinguishes it from other tools and positions it as a pioneering system for autonomous research.

The methodology of DOLPHIN is divided into three interconnected stages. First, the system retrieves and ranks relevant research papers on a topic. The papers are ranked based on relevance to the task and topic attributes, thus filtering out the most applicable references. Using the selected references, DOLPHIN generates novel and independent research ideas. The generated ideas are refined by using a sentence-transformer model, calculating cosine similarity, and removing redundancy.

Once ideas are finalized, DOLPHIN transitions to experimental verification. It automatically generates and debugs code using an exception-traceback-guided process. This involves analyzing error messages and their related code structure to make corrections efficiently. Experiments proceed iteratively, with results categorized as improvements, maintenance, or declines. Successful outcomes are incorporated into future cycles, enhancing idea generation quality over time.

DOLPHIN was tested on three benchmark tasks: image classification using CIFAR-100, 3D point classification with ModelNet40, and sentiment classification using SST-2. In image classification, DOLPHIN improved baseline models like WideResNet by up to 0.8%, achieving a top-1 accuracy of 82.0%. For 3D point classification, the system outperformed human-designed methods such as PointNet, achieving an overall accuracy of 93.9%—a 2.9% improvement over baseline models. In sentiment classification, DOLPHIN improved accuracy by 1.5% to close the gap between BERT-base and BERT-large performance. These results show that DOLPHIN can produce ideas on par with state-of-the-art methods, including its performance on diverse datasets and tasks.

An interesting feature of DOLPHIN is that it improves efficiency across research iterations. At iteration one, it produced 20 ideas, of which 19 were judged novel, at an average cost per idea of $0.184. DOLPHIN’s closed-loop system improved processing through the third iteration to enhance idea quality and experimental execution rates. The success rate of debugging went from 33.3% to 50.0% after structured feedback was incorporated on earlier errors. This iterative improvement underscores the robustness of DOLPHIN’s design in automating and optimizing the research process.

DOLPHIN represents a significant leap forward in AI-driven research by addressing key inefficiencies in traditional scientific workflows. Its ability to integrate literature review, idea generation, experimentation, and feedback into a seamless cycle demonstrates its potential for advancing scientific discovery. The framework improves efficiency and achieves results comparable to or exceeding those of human-designed systems. This positions DOLPHIN as a promising tool for addressing complex scientific challenges and fostering innovation in various domains.


Check out the Paper and Project Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

🚨 FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation IntelligenceJoin this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.


Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.

✅ [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

Credit: Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here