cs.OH

3 posts

arXiv:2412.09651v2 Announce Type: replace Abstract: Coding morbidity data using international standard diagnostic classifications is increasingly important and still challenging. Clinical coders and physicians assign codes to patient episodes based on their interpretation of case notes or electronic patient records. Therefore, accurate coding relies on the legibility of case notes and the coders' understanding of medical terminology. During the last ten years, many studies have shown poor reproducibility of clinical coding, even recently, with the application of Artificial Intelligence-based models. Given this context, the paper aims to present the SISCO.web approach designed to support physicians in filling in Hospital Discharge Records with proper diagnoses and procedures codes using the International Classification of Diseases (9th and 10th), and, above all, in identifying the main pathological condition. The web service leverages NLP algorithms, specific coding rules, as well as ad hoc decision trees to identify the main condition, showing promising results in providing accurate ICD coding suggestions.

Elena Cardillo, Lucilla Frattura3/10/2025

arxiv

cs.OH math.HO quant-ph

A QUBO Formulation for the Generalized Takuzu/LinkedIn Tango Game

arXiv:2501.00002v1 Announce Type: new Abstract: In this paper we present a QUBO formulation for the Takuzu game (or Binairo), for the most recent LinkedIn game, Tango, and for its generalizations. We optimize the number of variables needed to solve the combinatorial problem, making it suitable to be solved by quantum devices with fewer resources.

Alejandro Mata Ali, Edgar Mencia1/3/2025

arXiv

cs.OH q-bio.GN

Motif Caller: Sequence Reconstruction for Motif-Based DNA Storage

arXiv:2412.16074v1 Announce Type: new Abstract: DNA data storage is rapidly gaining traction as a long-term data archival solution, primarily due to its exceptional durability. Retrieving stored data relies on DNA sequencing, which involves a process called basecalling -- a typically costly and slow task that uses machine learning to map raw sequencing signals back to individual DNA bases (which are then translated into digital bits to recover the data). Current models for basecalling have been optimized for reading individual bases. However, with the advent of novel DNA synthesis methods tailored for data storage, there is significant potential for optimizing the reading process. In this paper, we focus on Motif-based DNA synthesis, where sequences are constructed from motifs -- groups of bases -- rather than individual bases. To enable efficient reading of data stored in DNA using Motif-based DNA synthesis, we designed Motif Caller, a machine learning model built to detect entire motifs within a DNA sequence, rather than individual bases. Motifs can also be detected from individually identified bases using a basecaller and then searching for motifs, however, such an approach is unnecessarily complex and slow. Building a machine learning model that directly identifies motifs allows to avoid the additional step of searching for motifs. It also makes use of the greater amount of features per motif, thus enabling finding the motifs with higher accuracy. Motif Caller significantly enhances the efficiency and accuracy of data retrieval in DNA storage based on Motif-Based DNA synthesis.

Parv Agarwal, Thomas Heinis12/23/2024