Skip to main navigation Skip to search Skip to main content

CNN-Transformer with Absolute Positional Encoding Optimized for Low-Dimensional Inputs: Applied to Estimate Sliding Drop Width

Sajjad Shumaly, Fahimeh Darvish, Mahsa Salehi, Navid Mohammadi Foumani, Oleksandra Kukharenko, Hans Jürgen Butt, Ulrich Schwanecke, Rüdiger Berger

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

High-speed video recordings are crucial for investigating drop dynamics and their interactions with surfaces. Measuring the width of sliding drops, a key parameter linked to frictional forces, requires additional equipment like cameras or mirrors, complicating experimental setups and limiting observable areas. This study introduces a novel method that simplifies the measurement process by employing artificial neural networks to estimate millimeter-scale drop width directly from side-view video data. Our approach processes raw video footage to dynamically identify features most indicative of drop width. By treating drop behavior as an extrinsic time-series problem, our model effectively captures temporal dependencies in video sequences. We propose a VGG8-inspired architecture optimized for small and low information density video datasets. This architecture is combined with our novel position invariant video processing methodology that efficiently removes non-essential regions, reducing computation time by 84%. We further integrate ConvTran, a state-of-the-art time-series classification model, with an enhanced Absolute Position Encoding, improving the encoding’s dot-product and lowering drop width estimation errors. Our novel neural network architecture achieved a root mean square error of 48 μm (1.7 % relative error), where each pixel corresponds to approximately 44 μm. Code and data are open-sourced at: https://github.com/shumaly/position_invariant_cnn_transformer.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track - European Conference, ECML PKDD 2025 Porto, Portugal, September 15–19, 2025 Proceedings, Part IX
EditorsInês Dutra, Alípio M. Jorge, Carlos Soares, João Gama, Mykola Pechenizkiy, Paulo Cortez, Sepideh Pashami, Pedro H. Abreu
Place of PublicationCham Switzerland
PublisherSpringer
Pages3-21
Number of pages19
ISBN (Electronic)9783032061188
ISBN (Print)9783032061171
DOIs
Publication statusPublished - 2026
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2025 - Porto, Portugal
Duration: 15 Sept 202519 Sept 2025
https://link.springer.com/book/10.1007/978-3-032-06078-5 (Proceedings)
https://ecmlpkdd.org/2025/ (Website)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume16021
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2025
Abbreviated titleECML PKDD 2025
Country/TerritoryPortugal
CityPorto
Period15/09/2519/09/25
Internet address

Keywords

  • extrinsic time series
  • low-dimensional absolute positional encoding
  • position invariant video processing
  • spatiotemporal CNN–Transformer

Cite this