Publications | Xiaofeng Tan

2025

EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation

Xiaofeng Tan#, Wanjiang Weng#, Haodong Lei, and 1 more author

Under Review, 2025

Abs Website

In recent years, motion generative models have undergone significant advancement, yet pose challenges in aligning with downstream objectives. Recent studies have shown that using differentiable rewards to directly align the preference of diffusion models yields promising results. However, these methods suffer from inefficient and coarse-grained optimization with high memory consumption. In this work, we first theoretically identify the \emphfundamental reason of these limitations: the recursive dependence between different steps in the denoising trajectory. Inspired by this insight, we propose \textbfEasyTune, which fine-tunes diffusion at each denoising step rather than over the entire trajectory. This decouples the recursive dependence, allowing us to perform (1) a dense and effective, (2) memory-efficient, and (3) fine-grained optimization. Furthermore, the scarcity of preference motion pairs restricts the availability of motion reward model training. To this end, we further introduce a \textbfSelf-refinement \textbfPreference \textbfLearning (\textbfSPL) mechanism that dynamically identifies preference pairs and conducts preference learning. Extensive experiments demonstrate that EasyTune outperforms ReFL by 62.1% in MM-Dist improvement while requiring only 34.5% of its additional memory overhead.
Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

Wanjiang Weng#, Xiaofeng Tan#, Hongsong Wang, and 1 more author

Under Review, 2025

Abs Website

Bilingual text-to-motion generation, which synthesizes 3D human motions from bilingual text inputs, holds immense potential for cross-linguistic applications in gaming, film, and robotics. However, this task faces critical challenges: the absence of bilingual motion-language datasets and the misalignment between text and motion distributions in diffusion models, leading to semantically inconsistent or low-quality motions. To address these challenges, we propose BiHumanML3D, a novel bilingual human motion dataset, which establishes a crucial benchmark for bilingual text-to-motion generation models. Furthermore, we propose a \textbfBilingual \textbfMotion \textbfDiffusion model (\textbfBiMD), which leverages cross-lingual aligned representations to capture semantics, thereby achieving a unified bilingual model. Building upon this, we propose \textbfReward-guided sampling \textbfAlignment (\textbfReAlign) method, comprising a step-aware reward model to assess alignment quality during sampling and a reward-guided strategy that directs the diffusion process toward an optimally aligned distribution. This reward model integrates step-aware tokens and combines a text-aligned module for semantic consistency and a motion-aligned module for realism, refining noisy motions at each timestep to balance probability density and alignment. Experiments demonstrate that our approach significantly improves text-motion alignment and motion quality compared to existing state-of-the-art methods.
Fuzzy Granule Density-Based Outlier Detection with Multi-Scale Granular Balls

Can Gao (Supervisor), Xiaofeng Tan^*, Jie Zhou, and 2 more authors

IEEE Transactions on Knowledge and Data Engineering, 2025

Abs DOI PDF Supp Code

Outlier detection refers to the identification of anomalous samples that deviate significantly from the distribution of normal data and has been extensively studied and used in a variety of practical tasks. However, most unsupervised outlier detection methods are carefully designed to detect specified outliers, while real-world data may be entangled with different types of outliers. In this study, we propose a fuzzy rough sets-based multi-scale outlier detection method to identify various types of outliers. Specifically, a novel fuzzy rough sets-based method that integrates relative fuzzy granule density is first introduced to improve the capability of detecting local outliers. Then, a multi-scale view generation method based on granular-ball computing is proposed to collaboratively identify group outliers at different levels of granularity. Moreover, reliable outliers and inliers determined by the three-way decision are used to train a weighted support vector machine to further improve the performance of outlier detection. The proposed method innovatively transforms unsupervised outlier detection into a semi-supervised classification problem and for the first time explores the fuzzy rough sets-based outlier detection from the perspective of multi-scale granular balls, allowing for high adaptability to different types of outliers. Extensive experiments carried out on both artificial and UCI datasets demonstrate that the proposed outlier detection method significantly outperforms the state-of-the-art methods, improving the results by at least 8.48% in terms of the Area Under the ROC Curve (AUROC) index.

2024

SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization

Xiaofeng Tan, Hongsong Wang, Xin Geng, and 1 more author

arXiv preprint arXiv:2412.05095 (Under Review), 2024

Abs arXiv PDF Supp Code Website

Text-to-motion generation is essential for advancing the creative industry but often presents challenges in producing consistent, realistic motions. To address this, we focus on fine-tuning text-to-motion models to consistently favor high-quality, human-preferred motions—a critical yet largely unexplored problem. In this work, we theoretically investigate the DPO under both online and offline settings, and reveal their respective limitation: overfitting in offline DPO, and biased sampling in online DPO. Building on our theoretical insights, we introduce Semi-online Preference Optimization (SoPo), a DPO-based method for training text-to-motion models using “semi-online” data pair, consisting of unpreferred motion from online distribution and preferred motion in offline datasets. This method leverages both online and offline DPO, allowing each to compensate for the other’s limitations. Extensive experiments demonstrate that SoPo outperforms other preference alignment methods, with an MM-Dist of 3.25% (vs e.g. 0.76% of MoDiPO) on the MLD model, 2.91% (vs e.g. 0.66% of MoDiPO) on MDM model, respectively. Additionally, the MLD model fine-tuned by our SoPo surpasses the SoTA model in terms of R-precision and MM Dist. Visualization results also show the efficacy of our SoPo in preference alignment.
Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection

Xiaofeng Tan, Hongsong Wang, Xin Geng, and 1 more author

arXiv preprint arXiv:2412.03044 (Under Review), 2024

Abs arXiv PDF Supp Code Website

Video anomaly detection is an essential yet challenging open-set task in computer vision, often addressed by leveraging reconstruction as a proxy task. However, existing reconstruction-based methods encounter challenges in two main aspects: (1) limited model robustness for open-set scenarios, (2) and an overemphasis on, but restricted capacity for, detailed motion reconstruction. To this end, we propose a novel frequency-guided diffusion model with perturbation training, which enhances the model robustness by perturbation training and emphasizes the principal motion components guided by motion frequencies. Specifically, we first use a trainable generator to produce perturbative samples for perturbation training of the diffusion model. During the perturbation training phase, the model robustness is enhanced and the domain of the reconstructed model is broadened by training against this generator. Subsequently, perturbative samples are introduced for inference, which impacts the reconstruction of normal and abnormal motions differentially, thereby enhancing their separability. Considering that motion details originate from high-frequency information, we propose a masking method based on 2D discrete cosine transform to separate high-frequency information and low-frequency information. Guided by the high-frequency information from observed motion, the diffusion model can focus on generating low-frequency information, and thus reconstructing the motion accurately. Experimental results on five video anomaly detection datasets, including human-related and open-set benchmarks, demonstrate the effectiveness of the proposed method. The code will be released to the public.
Multi-Scale Fuzzy Rough Sets based Anomaly Detection with Multiple Autoencoders

Xiaofeng Tan, Can Gao, Jie Zhou, and 1 more author

Under Review, 2024

Abs

Anomaly detection is a practical and essential research topic with a wide range of applications. However, existing anomaly detection methods may face challenges when handling high-dimensional data with complex distributions. In this study, we propose a multiple autoencoder-based anomaly detection method with the aid of fuzzy rough sets. Specifically, the autoencoder is first improved by introducing the kernel fuzzy relation to enhance its representation capability in low-dimensional space. Then, the theory of fuzzy rough sets is employed to perform anomaly detection in the learned low-dimensional representation by fusing multi-view proximity-based information. Finally, to handle complex data, multiple autoencoders are utilized to collaboratively detect anomalies by integrating local anomaly information from different perspectives. Comparative experiments conducted on the selected datasets reveal that the proposed method is superior to state-of-the-art methods, improving over classical autoencoder by 5.58% in terms of the AUC-ROC index.

2023

Three-way decision-based co-detection for outliers

Xiaofeng Tan, Can Gao, Jie Zhou, and 1 more author

International Journal of Approximate Reasoning, 2023

Abs DOI PDF

Outlier detection is an important research topic in data mining and machine learning. However, existing unsupervised outlier detection methods suffer from irrelevant and redundant attributes in high-dimensional data, and their performance is also limited by their outlier detection models that rely on only one view. In this study, we propose a three-way decision-based co-detection model for unsupervised outlier detection. Specifically, we first improve the local outlier factor (LOF) method by introducing the Gaussian kernel function to make the measure of local reachability density more accurate. Then, we introduce fuzzy rough sets to perform attribute reduction, which further reduces the negative effect of irrelevant and redundant attributes on the measure of sample similarity. Finally, we develop a co-detection model that is trained on the original view and the transformed view generated by principal component analysis and uses the strategy of the three-way decision to collaboratively detect outliers. The results of comparative experiments on the selected UCI datasets show that the proposed model outperforms state-of-the-art methods in terms of AUC-ROC index.