Xiaofeng Tan

I am a second-year M.Sc. student in the PALM Lab @ Southeast University (SEU), supervised by Prof. Hongsong Wang, currently a Research Intern at Tencent Hunyuan. I am also collaborating remotely with Prof. Ming-Hsuan Yang at UC Merced. Previously, I was a Research Intern at Tencent’s Youtu Lab. Prior to that, I was a remote visiting student at LV LAB @ Singapore Management University (SMU), where I worked closely with Prof. Pan Zhou. I received dual bachelor’s degrees (B.E. in CS, B.Sc. in Math) from Shenzhen University (SZU) in 2024, where I started research with Prof. Can Gao.

My research interests mainly revolve around RLHF, RLVR, AIGC, and 3D Human Modeling. I am actively seeking (1) Ph.D. positions for Fall 2027 and (2) other research collaborations related to RL or AIGC. Feel free to reach out via 📧 xiaofengtan@seu.edu.cn or 💬 WeChat: txf_06_20 . I'd be happy to connect 😊. And if you'd like to talk about Doraemon 😺, that works too. For more details, please check my CV / 中文简历.

👀 News

May 13, 2026	🏅 I am honored to be selected as a Gold Reviewer for ICML 2026!
Mar 31, 2026	🎉 I have joined Tencent Hunyuan（混元） as a Research Intern! Looking forward to exploring RLHF, and AIGC.
Jan 26, 2026	🎉 EasyTune has been accepted to ICLR 2026!
Nov 08, 2025	🎉 ReAlign has been accepted to AAAI 2026!
Oct 26, 2025	🏆 I am honored to receive the Tencent Scholarship and Southeast University First-Class Academic Scholarship.
Sep 19, 2025	🎉 SoPo has been accepted to NeurIPS 2025!
Jul 01, 2025	👏 I become a research intern at Youtu Lab（优图实验室）@ Tencent.
Dec 24, 2024	🎉 Merry Xmas! One paper from my undergraduate research is accepted by TKDE!
Sep 13, 2024	🏃🏻‍➡️ Officially started my M.Sc journey at SEU.
Jun 29, 2024	🎙️ I am honored to deliver a speech at the graduation ceremony on behalf of all graduates. [Photos]
Jun 12, 2024	🌈 I am honored to be interviewed by SZU as an outstanding student, and is published in the 2024 Shenzhen University Admissions Guide. [Link]

👨🏻‍🎓 Education Experience

Click a school below to expand details.

Southeast Unversity Master 2024 - Present

Shenzhen Unversity Dual Bachelor 2020 - 2024

Southeast Unversity

School of Computer Science and Engineering · 📍 Nanjing, China

Master Student 2024 - Present

Note Outstanding Camper (Ranked 2nd) in 2024 Graduate Admission Summer Camp.

Shenzhen Unversity

College of Computer Science & Software Engineering / College of Mathematics and Statistics · 📍 Shenzhen, China

Dual Degree (B.Eng. & B.Sc.) 2020 - 2024

Note Outstanding Graduate Representative (1/300+) who delivers a speech at the graduation ceremony.

🔬 Research Experience

Click an institution below to expand details.

VLL @ UCM Visiting student 2026.04 - Present

HY @ Tencent Research intern 2026.03 - Present

YouTu Lab @ Tencent Research intern 2025.07 - 2026.03

LV Lab @ SMU Visiting student 2024.07 - 2025.03

PALM Lab @ SEU Master student 2024.04 - Present

CV Institute @ SZU Research intern 2022.10 - 2023.10

UC Merced

VLL @ UCM · 📍 Merced, CA (Remote)

Visiting student 2026.04 - Present

Lab Vision and Learning Lab

Co-authors Ming-Hsuan Yang (Mentor), Haobo Yuan, Tiancheng Shen

Tencent Inc.

HY @ Tencent · 📍 Shenzhen, China

Research intern 2026.03 - Present

Lab HY（混元）

Co-authors TBD

Tencent Inc.

YouTu Lab @ Tencent · 📍 Shenzhen, China

Research intern 2025.07 - 2026.03

Lab YouTu Lab（优图实验室）

Co-authors Jun Liu (Mentor), Yuanting Fan, Bin-Bin Gao, Chengjie Wang

Works Consistent-RFT, PEC

Singapore Management University

LV Lab @ SMU · 📍 Singapore (Remote)

Visiting student 2024.07 - 2025.03

Lab Language and Vision Lab (LV Lab)

Co-authors Pan Zhou (Mentor), Hongsong Wang, Wanjiang Weng

Works SoPo (NeurIPS'25), ReAlign (AAAI'26)

Southeast University

PALM Lab @ SEU · 📍 Nanjing, China

Master student 2024.04 - Present

Lab PAttern Learning and Mining (PALM) Lab

Co-authors Hongsong Wang (Mentor), Xin Geng, Wanjiang Weng, Haodong Lei

Works FG-Diff., EasyTune (ICLR'26), MotionRFT

Shenzhen University

CV Institute @ SZU · 📍 Shenzhen, China

Research intern 2022.10 - 2023.10

Lab Computer Vision Institute

Co-authors Can Gao (Mentor)

Works MGBOD (TKDE'25), TWCOD (IJAR'23)

📖 Selected Publications

Publications

SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization

Xiaofeng Tan, Hongsong Wang, Xin Geng, and Pan Zhou

🏆 NeurIPS 2025| Advances in Neural Information Processing SystemsCCF-ACORE-A*

Abs arXiv PDF Supp Code Poster Website My Take

✨ Coming soon · 敬请期待

Text-to-motion generation is essential for advancing the creative industry but often presents challenges in producing consistent, realistic motions. To address this, we focus on fine-tuning text-to-motion models to consistently favor high-quality, human-preferred motions—a critical yet largely unexplored problem. In this work, we theoretically investigate the DPO under both online and offline settings, and reveal their respective limitation: overfitting in offline DPO, and biased sampling in online DPO. Building on our theoretical insights, we introduce Semi-online Preference Optimization (SoPo), a DPO-based method for training text-to-motion models using “semi-online" data pair, consisting of unpreferred motion from online distribution and preferred motion in offline datasets. This method leverages both online and offline DPO, allowing each to compensate for the other’s limitations. Extensive experiments demonstrate that SoPo outperforms other preference alignment methods, with an MM-Dist of 3.25% (vs e.g. 0.76% of MoDiPO) on the MLD model, 2.91% (vs e.g. 0.66% of MoDiPO) on MDM model, respectively. Additionally, the MLD model fine-tuned by our SoPo surpasses the SoTA model in terms of R-precision and MM Dist. Visualization results also show the efficacy of our SoPo in preference alignment.
EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation

Xiaofeng Tan*, Wanjiang Weng*, Haodong Lei, and Hongsong Wang

🏆 ICLR 2026| International Conference on Learning RepresentationsCCF-ACORE-A*

Abs arXiv PDF Website OpenReview My Take

✨ Coming soon · 敬请期待

In recent years, motion generative models have undergone significant advancement, yet pose challenges in aligning with downstream objectives. Recent studies have shown that using differentiable rewards to directly align the preference of diffusion models yields promising results. However, these methods suffer from inefficient and coarse-grained optimization with high memory consumption. In this work, we first theoretically identify the \emphfundamental reason of these limitations: the recursive dependence between different steps in the denoising trajectory. Inspired by this insight, we propose \textbfEasyTune, which fine-tunes diffusion at each denoising step rather than over the entire trajectory. This decouples the recursive dependence, allowing us to perform (1) a dense and effective, (2) memory-efficient, and (3) fine-grained optimization. Furthermore, the scarcity of preference motion pairs restricts the availability of motion reward model training. To this end, we further introduce a \textbfSelf-refinement \textbfPreference \textbfLearning (\textbfSPL) mechanism that dynamically identifies preference pairs and conducts preference learning. Extensive experiments demonstrate that EasyTune outperforms ReFL by 62.1% in MM-Dist improvement while requiring only 34.5% of its additional memory overhead.
ReAlign: Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

Wanjiang Weng*, Xiaofeng Tan*, Junbo Wang, Guo-Sen Xie, Pan Zhou, and Hongsong Wang

🏆 AAAI 2026| AAAI Conference on Artificial IntelligenceCCF-ACORE-A*

Abs arXiv PDF Code Website My Take

✨ Coming soon · 敬请期待

Text-to-motion generation, which synthesizes 3D human motions from text inputs, holds immense potential for applications in gaming, film, and robotics. Recently, diffusion-based methods have been shown to generate more diversity and realistic motion. However, there exists a misalignment between text and motion distributions in diffusion models, which leads to semantically inconsistent or low-quality motions. To address this limitation, we propose Reward-guided sampling Alignment (ReAlign), comprising a step-aware reward model to assess alignment quality during the denoising sampling and a reward-guided strategy that directs the diffusion process toward an optimally aligned distribution. This reward model integrates step-aware tokens and combines a text-aligned module for semantic consistency and a motion-aligned module for realism, refining noisy motions at each timestep to balance probability density and alignment. Extensive experiments of both motion generation and retrieval tasks demonstrate that our approach significantly improves text-motion alignment and motion quality compared to existing state-of-the-art methods.
Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

Can Gao (Supervisor), Xiaofeng Tan*, Jie Zhou, Weiping Ding, and Witold Pedrycz

🏆 TKDE 2025| IEEE Transactions on Knowledge and Data EngineeringCCF-ASCI-Q1

Abs DOI PDF Supp Code My Take

✨ Coming soon · 敬请期待

Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data and has been extensively studied and used in a variety of practical tasks. However, most unsupervised outlier detection methods are carefully designed to detect specified outliers, while real-world data may be entangled with different types of outliers. In this study, we propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers. Specifically, a novel fuzzy rough sets-based method that integrates relative fuzzy granule density is first introduced to improve the capability of detecting local outliers. Then, a multi-scale view generation method based on granular-ball computing is proposed to collaboratively identify group outliers at different levels of granularity. Moreover, reliable outliers and inliers determined by the three-way decision are used to train a weighted support vector machine to further improve the performance of outlier detection. The proposed method innovatively transforms unsupervised outlier detection into a semi-supervised classification problem and for the first time explores the fuzzy rough sets-based outlier detection from the perspective of multi-scale granular balls, allowing for high adaptability to different types of outliers. Extensive experiments carried out on both artificial and UCI datasets demonstrate that the proposed outlier detection method significantly outperforms the state-of-the-art methods, improving the results by at least 8.48% in terms of the Area Under the ROC Curve (AUROC) index.

Preprints

MotionRFT: Unified Reinforcement Fine-Tuning for Text-to-Motion Generation

Xiaofeng Tan, Wanjiang Weng, Hongsong Wang, Fang Zhao, Xin Geng, and Liang Wang

⏱️ UR 2026| Under Review (TPAMI)CCF-None

Abs Code Website My Take

✨ Coming soon · 敬请期待

Text-to-motion generation has rapidly advanced with diffusion- and flow-based generative models, yet supervised pre-training remains insufficient to align models with high-level objectives such as semantic consistency, realism, and human preference. We present a reinforcement fine-tuning framework that comprises a heterogeneous-representation, multi-dimensional reward model MotionReward and an efficient, fine-grained fine-tuning strategy EasyTune. Extensive experiments demonstrate strong cross-model and cross-representation generalization, achieving FID 0.132 with 22.10 GB peak memory and saving up to 15.22 GB over DRaFT.
When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy

Xiaofeng Tan, Jun Liu, Bin-Bin Gao, Yuanting Fan, Xi Jiang, Chengjie Wang, Hongsong Wang, and Feng Zheng

⏱️ UR 2026| Under ReviewCCF-None

Abs arXiv PDF Website My Take

✨ Coming soon · 敬请期待

RLHF is widely used to align flow-matching text-to-image models with human preferences, but often leads to severe diversity collapse after fine-tuning. In RL, diversity is often assumed to correlate with policy entropy, motivating entropy regularization. However, we show this intuition breaks in flow models: policy entropy remains constant, even while perceptual diversity collapses. We explain this mismatch both theoretically and empirically: the constant entropy arises from the fixed, pre-defined noise schedule, while the diversity collapse is driven by the mode-seeking nature of policy gradients. As a result, policy entropy fails to prevent the model from converging to a narrow high-reward region in the perceptual space. To this end, we introduce perceptual entropy that captures diversity in a perceptual space and maintains the property of standard entropy. Building upon this insight, we propose two entropy-regularized strategies, Perceptual Entropy Constraint (PEC) and Perceptual Constraints on Generation Space (PCVAE), to preserve perceptual diversity and improve the quality. Experiments across two base models (FLUX.dev, SD3.5-M), neural and rule-based rewards, and three perceptual spaces (PickScore, DINO, CLIP) demonstrate consistent gains in the quality-diversity trade-off; PEC achieves the best overall score of 0.734 (vs. baseline’s 0.366); a complementary setting of PEC further reaches a diversity average of 0.989 (vs. baseline’s 0.047).
Dual-Grained Policy Optimization for Hallucination Reduction in Flow-based RL

Xiaofeng Tan, Jun Liu, Yuanting Fan, Bin-Bin Gao, Xi Jiang, Xiaochen Chen, Jinlong Peng, Chengjie Wang, Hongsong Wang, and Feng Zheng

⏱️ UR 2025| Under ReviewCCF-None

Abs arXiv PDF Code Website My Take

✨ Coming soon · 敬请期待
Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection

Xiaofeng Tan, Hongsong Wang, Xin Geng, and Liang Wang

⏱️ UR 2025| Under ReviewCCF-None

Abs arXiv PDF Supp Code Website My Take

✨ Coming soon · 敬请期待

Video anomaly detection is an essential yet challenging open-set task in computer vision, often addressed by leveraging reconstruction as a proxy task. However, existing reconstruction-based methods encounter challenges in two main aspects: (1) limited model robustness for open-set scenarios, (2) and an overemphasis on, but restricted capacity for, detailed motion reconstruction. To this end, we propose a novel frequency-guided diffusion model with perturbation training, which enhances the model robustness by perturbation training and emphasizes the principal motion components guided by motion frequencies. Specifically, we first use a trainable generator to produce perturbative samples for perturbation training of the diffusion model. During the perturbation training phase, the model robustness is enhanced and the domain of the reconstructed model is broadened by training against this generator. Subsequently, perturbative samples are introduced for inference, which impacts the reconstruction of normal and abnormal motions differentially, thereby enhancing their separability. Considering that motion details originate from high-frequency information, we propose a masking method based on 2D discrete cosine transform to separate high-frequency information and low-frequency information. Guided by the high-frequency information from observed motion, the diffusion model can focus on generating low-frequency information, and thus reconstructing the motion accurately. Experimental results on five video anomaly detection datasets, including human-related and open-set benchmarks, demonstrate the effectiveness of the proposed method. The code will be released to the public.