This retrospective, single-center observational study will use routinely collected perioperative data from adults undergoing surgery for symptomatic hemorrhoidal disease to identify data-driven clinical phenotypes. Unsupervised machine learning will be applied to characterize clusters of patients based on demographic, clinical, anatomical, and surgical variables. The study will explore whether the resulting phenotypes differ in operative complexity and postoperative course, and will generate hypotheses to inform future predictive models and personalized surgical planning.
Hemorrhoidal disease presents with heterogeneous symptom patterns, anatomical findings, and operative strategies that are not fully captured by traditional degree-based classifications. This study aims to identify latent, clinically interpretable phenotypes among surgical patients using a fully unsupervised machine learning pipeline applied to routinely collected perioperative data from a high-volume tertiary referral center. This is a retrospective, observational analysis of de-identified institutional records. The analytic dataset will include routinely documented variables spanning baseline demographics/anthropometrics, symptom profile and relevant clinical history, operative technique and intraoperative descriptors, and routinely captured postoperative follow-up information. Data will be extracted using a predefined data dictionary and standardized preprocessing rules to support reproducibility and reduce variability in variable definitions. The primary analytic approach will be unsupervised clustering. Variables will be cleaned and standardized prior to modeling. Dimensionality reduction will be performed using t-distributed stochastic neighbor embedding (t-SNE), initialized with principal component analysis to improve stability. Cluster discovery will then be conducted using k-means clustering on the reduced feature space. A range of cluster solutions will be explored, and the final solution will be selected using internal validity metrics (e.g., silhouette-based measures) together with assessment of clinical interpretability. Model robustness will be evaluated through repeated runs across multiple random seeds and key parameter settings to assess stability of cluster assignments. After cluster assignment, clusters will be characterized using descriptive and comparative statistics to identify variables that most differentiate phenotypes. Post-hoc feature relevance/importance approaches will be used to explore which demographic, clinical, and surgical factors most strongly contribute to cluster formation, with emphasis on effect sizes and clinically meaningful patterns rather than hypothesis-testing alone. Findings will be used to generate hypotheses regarding phenotypes that may be associated with greater operative complexity and different postoperative trajectories, supporting future work on predictive modeling and individualized surgical decision support. All analyses will be conducted within a controlled institutional environment using validated statistical and data-mining software, with documented parameter settings and version tracking to enable reproducibility. Only de-identified data will be used for analysis, and results will be reported in aggregate to protect patient privacy.
Study Type
OBSERVATIONAL
Enrollment
100
standard hemorrhoidectomy, advanced hemorrhoidectomy, prolapsectomy, Doppler-guided procedures, or combined techniques
IRCCS Policlinico San Donato
San Donato Milanese, Milan, Italy
Internal validity of the unsupervised clustering solution (silhouette coefficient)
Silhouette coefficient of the final k-means clustering solution derived from t-SNE-reduced perioperative data. The silhouette coefficient will be used as the primary internal validity metric to quantify cluster cohesion and separation for the selected number of clusters.
Time frame: From completion of dataset extraction/cleaning through completion of clustering analysis (retrospective analysis of surgeries performed December 2024 to June 2025)
Cluster stability and reproducibility across model runs
Stability of cluster assignments across multiple random seeds and t-SNE parameter settings (including perplexity), summarized by reproducibility/consistency of membership and stability of internal validity metrics across runs.
Time frame: From completion of dataset extraction/cleaning through completion of clustering robustness analyses (retrospective analysis of surgeries performed December 2024 to June 2025)
Operative duration (proxy of operative complexity)
Operative duration (minutes) recorded in the operative report/perioperative database; compared across identified phenotypes.
Time frame: Intraoperative (day of surgery)
Postoperative pain intensity
Pain intensity as documented in routine postoperative records/follow-up notes (e.g., numeric rating scale when available or clinician-documented pain status), analyzed as pain trajectory/pattern across early and intermediate follow-up and compared across clusters.
Time frame: From surgery to 6 month postoperatively
Postoperative complications (Clavien-Dindo classification)
Any postoperative complication recorded in routine follow-up, graded according to the Clavien-Dindo system; complication rates and severity compared across clusters.
Time frame: From surgery to 1 month postoperatively (early complications) and up to 6 months postoperatively (late complications)
Time to return to routine activities
Time to return to routine activities/work when documented in follow-up notes; compared across phenotypes.
Time frame: From surgery to 1 month postoperatively
Recurrence
Recurrence patterns or persistence/return of hemorrhoid-related symptoms (e.g., bleeding/prolapse/other symptoms) as documented in routine follow-up and need for re-evaluation or additional intervention; compared across clusters.
Time frame: 1 month and 6 months postoperatively
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.