We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 625 entries: 1-625 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 26 Apr 24

[1]  arXiv:2404.16037 [pdf, other]
Title: VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)

Sparse meteorological forecasting is indispensable for fine-grained weather forecasting and deserves extensive attention. Recent studies have highlighted the potential of spatio-temporal graph convolutional networks (ST-GCNs) in predicting numerical data from ground weather stations. However, as one of the highest fidelity and lowest latency data, the application of the vision data from satellites in ST-GCNs remains unexplored. There are few studies to demonstrate the effectiveness of combining the above multi-modal data for sparse meteorological forecasting. Towards this objective, we introduce Vision-Numerical Fusion Graph Convolutional Network (VN-Net), which mainly utilizes: 1) Numerical-GCN (N-GCN) to adaptively model the static and dynamic patterns of spatio-temporal numerical data; 2) Vision-LSTM Network (V-LSTM) to capture multi-scale joint channel and spatial features from time series satellite images; 4) a GCN-based decoder to generate hourly predictions of specified meteorological factors. As far as we know, VN-Net is the first attempt to introduce GCN method to utilize multi-modal data for better handling sparse spatio-temporal meteorological forecasting. Our experiments on Weather2k dataset show VN-Net outperforms state-of-the-art by a significant margin on mean absolute error (MAE) and root mean square error (RMSE) for temperature, relative humidity, and visibility forecasting. Furthermore, we conduct interpretation analysis and design quantitative evaluation metrics to assess the impact of incorporating vision data.

[2]  arXiv:2404.16038 [pdf, other]
Title: A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
Comments: 16 pages, 10 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)

This paper offers an insightful examination of how currently top-trending AI technologies, i.e., generative artificial intelligence (Generative AI) and large language models (LLMs), are reshaping the field of video technology, including video generation, understanding, and streaming. It highlights the innovative use of these technologies in producing highly realistic videos, a significant leap in bridging the gap between real-world dynamics and digital creation. The study also delves into the advanced capabilities of LLMs in video understanding, demonstrating their effectiveness in extracting meaningful information from visual content, thereby enhancing our interaction with videos. In the realm of video streaming, the paper discusses how LLMs contribute to more efficient and user-centric streaming experiences, adapting content delivery to individual viewer preferences. This comprehensive review navigates through the current achievements, ongoing challenges, and future possibilities of applying Generative AI and LLMs to video-related tasks, underscoring the immense potential these technologies hold for advancing the field of video technology related to multimedia, networking, and AI communities.

[3]  arXiv:2404.16039 [pdf, other]
Title: On a vectorized basic linear algebra package for prototyping codes in MATLAB
Comments: 35 pages, 8 figures
Subjects: Mathematical Software (cs.MS)

When writing high-performance code for numerical computation in a scripting language like MATLAB, it is crucial to have the operations in a large for-loop vectorized. If not, the code becomes too slow to use, even for a moderately large problem. However, in the process of vectorizing, the code often loses its original structure and becomes less readable. This is particularly true in the case of a finite element implementation, even though finite element methods are inherently structured. A basic remedy to this is the separation of the vectorization part from the mathematics part of the code, which is easily achieved through building the code on top of the basic linear algebra subprograms that are already vectorized codes, an idea that has been used in a series of papers over the last fifteen years, developing codes that are fast and still structured and readable. We discuss the vectorized basic linear algebra package and introduce a formalism using multi-linear algebra to explain and define formally the functions in the package, as well as MATLAB pagetime functions. We provide examples from computations of varying complexity, including the computation of normal vectors, volumes, and finite element methods. Benchmarking shows that we also get fast computations. Using the library, we can write codes that closely follow our mathematical thinking, making writing, following, reusing, and extending the code easier.

[4]  arXiv:2404.16041 [pdf, other]
Title: Forklift: An Extensible Neural Lifter
Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The escalating demand to migrate legacy software across different Instruction Set Architectures (ISAs) has driven the development of assembly-to-assembly translators to map between their respective assembly languages. However, the development of these tools requires substantial engineering effort. State-of-the-art approaches use lifting, a technique where source assembly code is translated to an architecture-independent intermediate representation (IR) (for example, the LLVM IR) and use a pre-existing compiler to recompile the IR to the target ISA. However, the hand-written rules these lifters employ are sensitive to the particular compiler and optimization level used to generate the code and require significant engineering effort to support each new ISA. We propose Forklift, the first neural lifter that learns how to translate assembly to LLVM IR using a token-level encoder-decoder Transformer. We show how to incrementally add support to new ISAs by fine tuning the assembly encoder and freezing the IR decoder, improving the overall accuracy and efficiency. We collect millions of parallel LLVM IR, x86, ARM, and RISC-V programs across compilers and optimization levels to train Forklift and set up an input/output-based accuracy harness. We evaluate Forklift on two challenging benchmark suites and translate 2.5x more x86 programs than a state-of-the-art hand-written lifter and 4.4x more x86 programs than GPT-4 as well as enabling translation from new ISAs.

[5]  arXiv:2404.16042 [pdf, ps, other]
Title: How explainable AI affects human performance: A systematic review of the behavioural consequences of saliency maps
Authors: Romy Müller
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Saliency maps can explain how deep neural networks classify images. But are they actually useful for humans? The present systematic review of 68 user studies found that while saliency maps can enhance human performance, null effects or even costs are quite common. To investigate what modulates these effects, the empirical outcomes were organised along several factors related to the human tasks, AI performance, XAI methods, images to be classified, human participants and comparison conditions. In image-focused tasks, benefits were less common than in AI-focused tasks, but the effects depended on the specific cognitive requirements. Moreover, benefits were usually restricted to incorrect AI predictions in AI-focused tasks but to correct ones in image-focused tasks. XAI-related factors had surprisingly little impact. The evidence was limited for image- and human-related factors and the effects were highly dependent on the comparison conditions. These findings may support the design of future user studies.

[6]  arXiv:2404.16043 [pdf, ps, other]
Title: A Genetic Algorithm-Based Support Vector Machine Approach for Intelligent Usability Assessment of m-Learning Applications
Comments: 20 pages, 22 Figures, Journal Paper
Journal-ref: Mobile Information Systems Mobile Information Systems, Volume 2022, Article ID 1609757, 20 pages
Subjects: Human-Computer Interaction (cs.HC)

In the field of human-computer interaction (HCI), the usability assessment of m-learning (mobile-learning) applications is a real challenge. Such assessment typically involves extraction of the best features of an application like efficiency, effectiveness, learnability, cognition, memorability, etc., and further ranking of those features for an overall assessment of the quality of the mobile application. In the previous literature, it is found that there is neither any theory nor any tool available to measure or assess a user perception and assessment of usability features of a m-learning application for the sake of ranking the graphical user interface of a mobile application in terms of a user acceptance and satisfaction. In this paper, a novel approach is presented by performing a mobile applications quantitative and qualitative analysis. Based on user requirements and perception, a criterion is defined based on a set of important features. Afterward, for the qualitative analysis, a genetic algorithm (GA) is used to score prescribed features for the usability assessment of a mobile application. The used approach assigns a score to each usability feature according to the user requirement and weight of each feature. GA performs the rank assessment process initially by performing feature selection and scoring the best features of the application. A comparison of assessment analysis of GA and various machine learning models, K-nearest neighbours, Naive Bayes, and Random Forests is performed. It was found that a GA-based support vector machine (SVM) provides more accuracy in the extraction of the best features of a mobile application and further ranking of those features.

[7]  arXiv:2404.16044 [pdf, other]
Title: Toward the Categorical Data Map
Comments: 12 pages, 10 figures, LaTeX
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)

Categorical data does not have an intrinsic definition of distance or order, and therefore, established visualization techniques for categorical data only allow for a set-based or frequency-based analysis, e.g., through Euler diagrams or Parallel Sets, and do not support a similarity-based analysis. We present a novel dimensionality reduction-based visualization for categorical data, which is based on defining the distance of two data items as the number of varying attributes. Our technique enables users to pre-attentively detect groups of similar data items and observe the properties of the projection, such as attributes strongly influencing the embedding. Our prototype visually encodes data properties in an enhanced scatterplot-like visualization, encoding attributes in the background to show the distribution of categories. In addition, we propose two graph-based measures to quantify the plot's visual quality, which rank attributes according to their contribution to cluster cohesion. To demonstrate the capabilities of our similarity-based approach, we compare it to Euler diagrams and Parallel Sets regarding visual scalability and show its benefits through an expert study with five data scientists analyzing the Titanic and Mushroom datasets with up to 23 attributes and 8124 category combinations. Our results indicate that the Categorical Data Map offers an effective analysis method, especially for large datasets with a high number of category combinations.

[8]  arXiv:2404.16045 [pdf, other]
Title: Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Requirements elicitation, a critical, yet time-consuming and challenging step in product development, often fails to capture the full spectrum of user needs. This may lead to products that fall short of expectations. This paper introduces a novel framework that leverages Large Language Models (LLMs) to automate and enhance the requirements elicitation process. LLMs are used to generate a vast array of simulated users (LLM agents), enabling the exploration of a much broader range of user needs and unforeseen use cases. These agents engage in product experience scenarios, through explaining their actions, observations, and challenges. Subsequent agent interviews and analysis uncover valuable user needs, including latent ones. We validate our framework with three experiments. First, we explore different methodologies for diverse agent generation, discussing their advantages and shortcomings. We measure the diversity of identified user needs and demonstrate that context-aware agent generation leads to greater diversity. Second, we show how our framework effectively mimics empathic lead user interviews, identifying a greater number of latent needs than conventional human interviews. Third, we showcase that LLMs can be used to analyze interviews, capture needs, and classify them as latent or not. Our work highlights the potential of using LLM agents to accelerate early-stage product development, reduce costs, and increase innovation.

[9]  arXiv:2404.16046 [pdf, other]
Title: Using Automated Vehicle Data as a Fitness Tracker for Sustainability
Comments: 6 pages, 5 figures, 1 table, 4th IEEE Forum for Innovative Sustainable Transportation Systems (Fists 2024)
Subjects: Human-Computer Interaction (cs.HC)

This work describes the use of on-board vehicle data from cars with advanced driver assistance features as a trip summary, with the goal of helping drivers contextualize their driving habits in terms of sustainability. The approach is similar to recent advancements in fitness tracking apps, which leverage smartwatches and other wearable devices to characterize activities during a workout or as part of daily fitness monitoring. Instead of adding new vehicle sensors, the data used for this work is from on-board driving data, namely, signals decoded from the vehicle's Controller Area Network (CAN) bus. With the deepening research of automatic driving technologies, Autonomous Vehicles (AVs) have gradually entered the consumer field, and more users are benefiting from the convenience and safety assistance provided by driving assistance and autonomous driving. However, various technical obstacles persist due to the complex environment, the non-communication of technologies, and users' trust. We propose indicators for evaluating the key characteristics of each drive, to facilitate drivers' familiarity with advanced driver assistance systems and to allow them to consider how different driving styles affect sustainability metrics. Further extensions will allow users to add feedback as part of the driving summary, laying a data foundation for future controller iterations based on real driving data and the attitude of drivers towards it.

[10]  arXiv:2404.16047 [pdf, other]
Title: From "AI" to Probabilistic Automation: How Does Anthropomorphization of Technical Systems Descriptions Influence Trust?
Comments: Accepted to FAccT 2024. arXiv admin note: text overlap with arXiv:2403.05957
Journal-ref: FAccT 2024
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

This paper investigates the influence of anthropomorphized descriptions of so-called "AI" (artificial intelligence) systems on people's self-assessment of trust in the system. Building on prior work, we define four categories of anthropomorphization (1. Properties of a cognizer, 2. Agency, 3. Biological metaphors, and 4. Properties of a communicator). We use a survey-based approach (n=954) to investigate whether participants are likely to trust one of two (fictitious) "AI" systems by randomly assigning people to see either an anthropomorphized or a de-anthropomorphized description of the systems. We find that participants are no more likely to trust anthropomorphized over de-anthropmorphized product descriptions overall. The type of product or system in combination with different anthropomorphic categories appears to exert greater influence on trust than anthropomorphizing language alone, and age is the only demographic factor that significantly correlates with people's preference for anthropomorphized or de-anthropomorphized descriptions. When elaborating on their choices, participants highlight factors such as lesser of two evils, lower or higher stakes contexts, and human favoritism as driving motivations when choosing between product A and B, irrespective of whether they saw an anthropomorphized or a de-anthropomorphized description of the product. Our results suggest that "anthropomorphism" in "AI" descriptions is an aggregate concept that may influence different groups differently, and provide nuance to the discussion of whether anthropomorphization leads to higher trust and over-reliance by the general public in systems sold as "AI".

[11]  arXiv:2404.16048 [pdf, other]
Title: GUIDE: Graphical User Interface Data for Execution
Comments: 11 pages, 8 figures, 3 Tables and 1 Algorithm
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, the last action taken, CoT and the next action to be performed along with grounding information of where the action needs to be executed. The data is collected using our in-house advanced annotation tool NEXTAG (Next Action Grounding and Annotation Tool). The data is adapted for multiple OS, browsers and display types. It is collected by multiple annotators to capture the variation of design and the way person uses a website.
Through this dataset, we aim to facilitate research and development in the realm of LLMs for graphical user interfaces, particularly in tasks related to RPA. The dataset's multi-platform nature and coverage of diverse websites enable the exploration of cross-interface capabilities in automation tasks. We believe that our dataset will serve as a valuable resource for advancing the capabilities of multi-platform LLMs in practical applications, fostering innovation in the field of automation and natural language understanding. Using GUIDE, we build V-Zen, the first RPA model to automate multiple websites using our in-House Automation tool AUTONODE

[12]  arXiv:2404.16050 [pdf, ps, other]
Title: Implications of computer science theory for the simulation hypothesis
Authors: David H. Wolpert
Comments: 44 pages of text, 5 pages of references, 10 pages of appendices
Subjects: Logic in Computer Science (cs.LO); History and Philosophy of Physics (physics.hist-ph)

The simulation hypothesis has recently excited renewed interest, especially in the physics and philosophy communities. However, the hypothesis specifically concerns {computers} that simulate physical universes, which means that to properly investigate it we need to couple computer science theory with physics. Here I do this by exploiting the physical Church-Turing thesis. This allows me to introduce a preliminary investigation of some of the computer science theoretic aspects of the simulation hypothesis. In particular, building on Kleene's second recursion theorem, I prove that it is mathematically possible for us to be in a simulation that is being run on a computer \textit{by us}. In such a case, there would be two identical instances of us; the question of which of those is ``really us'' is meaningless. I also show how Rice's theorem provides some interesting impossibility results concerning simulation and self-simulation; briefly describe the philosophical implications of fully homomorphic encryption for (self-)simulation; briefly investigate the graphical structure of universes simulating universes simulating universes, among other issues. I end by describing some of the possible avenues for future research that this preliminary investigation reveals.

[13]  arXiv:2404.16051 [pdf, other]
Title: TimeFlows: Visualizing Process Chronologies from Vast Collections of Heterogeneous Information Objects
Comments: 16 pages, accepted at RCIS 2024
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)

In many fact-finding investigations, notably parliamentary inquiries, process chronologies are created to reconstruct how a controversial policy or decision came into existence. Current approaches, like timelines, lack the expressiveness to represent the variety of relations in which historic events may link to the overall chronology. This obfuscates the nature of the interdependence among the events, and the texts from which they are distilled. Based on explorative interviews with expert analysts, we propose an extended, rich set of relationships. We describe how these can be visualized as TimeFlows. We provide an example of such a visualization by illustrating the Childcare Benefits Scandal -- an affair that deeply affected Dutch politics in recent years. This work extends the scope of existing process discovery research into the direction of unveiling non-repetitive processes from unstructured information objects.

[14]  arXiv:2404.16053 [pdf, other]
Title: Human Latency Conversational Turns for Spoken Avatar Systems
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

A problem with many current Large Language Model (LLM) driven spoken dialogues is the response time. Some efforts such as Groq address this issue by lightning fast processing of the LLM, but we know from the cognitive psychology literature that in human-to-human dialogue often responses occur prior to the speaker completing their utterance. No amount of delay for LLM processing is acceptable if we wish to maintain human dialogue latencies. In this paper, we discuss methods for understanding an utterance in close to real time and generating a response so that the system can comply with human-level conversational turn delays. This means that the information content of the final part of the speaker's utterance is lost to the LLM. Using the Google NaturalQuestions (NQ) database, our results show GPT-4 can effectively fill in missing context from a dropped word at the end of a question over 60% of the time. We also provide some examples of utterances and the impacts of this information loss on the quality of LLM response in the context of an avatar that is currently under development. These results indicate that a simple classifier could be used to determine whether a question is semantically complete, or requires a filler phrase to allow a response to be generated within human dialogue time constraints.

[15]  arXiv:2404.16054 [pdf, other]
Title: LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

The emergent large language/multimodal models facilitate the evolution of mobile agents, especially in the task of mobile UI automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined ones, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on-device agent execution and faithful, scalable agent evaluation. By observing that the task execution process only transfers UI states, LlamaTouch employs a novel evaluation approach that only assesses whether an agent traverses all manually annotated, essential application/system states. LlamaTouch comprises three key techniques: (1) On-device task execution that enables mobile agents to interact with real mobile environments for task completion. (2) Fine-grained UI component annotation that merges pixel-level screenshots and textual screen hierarchies to explicitly identify and precisely annotate essential UI components with a rich set of designed annotation primitives. (3) A multi-level state matching algorithm that utilizes exact and fuzzy matching to accurately detect critical information in each screen with unpredictable UI layout/content dynamics. LlamaTouch currently incorporates four mobile agents and 495 UI automation tasks, encompassing both tasks in the widely-used datasets and our self-constructed ones for more diverse mobile applications. Evaluation results demonstrate the LlamaTouch's high faithfulness of evaluation in real environments and its better scalability than human validation. LlamaTouch also enables easy task annotation and integration of new mobile agents. Code and dataset are publicly available at https://github.com/LlamaTouch/LlamaTouch.

[16]  arXiv:2404.16055 [pdf, other]
Title: Assessing Climate Transition Risks in the Colombian Processed Food Sector: A Fuzzy Logic and Multicriteria Decision-Making Approach
Subjects: Artificial Intelligence (cs.AI)

Climate risk assessment is becoming increasingly important. For organisations, identifying and assessing climate-related risks is challenging, as they can come from multiple sources. This study identifies and assesses the main climate transition risks in the colombian processed food sector. As transition risks are vague, our approach uses Fuzzy Logic and compares it to various multi-criteria decision-making methods to classify the different climate transition risks an organisation may be exposed to. This approach allows us to use linguistic expressions for risk analysis and to better describe risks and their consequences. The results show that the risks ranked as the most critical for this organisation in their order were price volatility and raw materials availability, the change to less carbon-intensive production or consumption patterns, the increase in carbon taxes and technological change, and the associated development or implementation costs. These risks show a critical risk level, which implies that they are the most significant risks for the organisation in the case study. These results highlight the importance of investments needed to meet regulatory requirements, which are the main drivers for organisations at the financial level.

[17]  arXiv:2404.16056 [pdf, ps, other]
Title: Intelligent Machines and Incomplete Information
Comments: 35 pages
Subjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)

The distribution of efficient individuals in the economy and the efforts that they will put in if they are hired, there are two important concerns for a technologically advanced firm. wants to open a new branch. The firm does not have information about the exact level of efficiency of an individual when she is hired. We call this situation incomplete information. The standard principal agent models assume that employees know their efficiency levels. Hence these models design incentive-compatible mechanisms. An incentive-compatible mechanism ensures that a participant does not have the incentive to misreport her efficiency level. This paper does not assume that employees know how efficient they are. This paper assumes that the production technology of the firm is intelligent, that is, the output of the machine reveals the efficiency levels of employees. Employees marginal contributions to the total output of the intelligent machine, the probability distribution of the levels of efficiency and employees costs of efforts together define a game of incomplete information. A characterization of ex-ante Nash Equilibrium is established. The results of the characterization formalize the relationship between the distribution of efficiency levels and the distribution of output.

[18]  arXiv:2404.16057 [pdf, ps, other]
Title: LuminLab: An AI-Powered Building Retrofit and Energy Modelling Platform
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

This paper describes the technical and conceptual development of the LuminLab platform, an online tool that integrates a purpose-fit human-centric AI chatbot and predictive energy model into a streamlined front-end that can rapidly produce and discuss building retrofit plans in natural language. The platform provides users with the ability to engage with a range of possible retrofit pathways tailored to their individual budget and building needs on-demand. Given the complicated and costly nature of building retrofit projects, which rely on a variety of stakeholder groups with differing goals and incentives, we feel that AI-powered tools such as this have the potential to pragmatically de-silo knowledge, improve communication, and empower individual homeowners to undertake incremental retrofit projects that might not happen otherwise.

[19]  arXiv:2404.16060 [pdf, ps, other]
Title: Pocket Schlieren: a background oriented schlieren imaging platform on a smartphone
Comments: 24 pages, 6 figures, 4 Supplementary figures
Subjects: Human-Computer Interaction (cs.HC); Physics Education (physics.ed-ph); Optics (physics.optics)

Background-oriented schlieren (BOS) is a powerful technique for flow visualization. Nevertheless, the widespread dissemination of BOS is impeded by its dependence on scientific cameras, computing hardware, and dedicated analysis software. In this work, we aim to democratize BOS by providing a smartphone based scientific tool called "Pocket Schlieren". Pocket Schlieren enables users to directly capture, process, and visualize flow phenomena on their smartphones. The underlying algorithm incorporates consecutive frame subtraction (CFS) and optical flow (OF) techniques to compute the density gradients inside a flow. It performs on both engineered and natural background patterns. Using Pocket Schlieren, we successfully visualized the flow produced from a burning candle flame, butane lighter, hot soldering iron, room heater, water immersion heating rod, and a large outdoor butane flame. Pocket Schlieren promises to serve as a frugal yet potent instrument for scientific and educational purposes. We have made it publicly available at doi: 10.5281/zenodo.10949271.

[20]  arXiv:2404.16061 [pdf, ps, other]
Title: Dynamic Many Valued Logic Systems in Theoretical Economics
Authors: Daniel Lu
Subjects: Logic in Computer Science (cs.LO); Theoretical Economics (econ.TH)

This paper is an original attempt to understand the foundations of economic reasoning. It endeavors to rigorously define the relationship between subjective interpretations and objective valuations of such interpretations in the context of theoretical economics. This analysis is substantially expanded through a dynamic approach, where the truth of a valuation results in an updated interpretation or changes in the agent's subjective belief regarding the effectiveness of the selected action as well as the objective reality of the effectiveness of all other possible actions (i.e. consequence realization). Complications arise when the economic agent is presented with a set of actions that render ambiguous preference, or when the effectiveness of an action cannot be perceived upon its selection, thereby necessitating a different theory of choice and consequence realization.

[21]  arXiv:2404.16062 [pdf, other]
Title: QuickerCheck: Implementing and Evaluating a Parallel Run-Time for QuickCheck
Comments: 12 pages, IFL 2023
Subjects: Programming Languages (cs.PL)

This paper introduces a new parallel run-time for QuickCheck, a Haskell library and EDSL for specifying and randomly testing properties of programs. The new run-time can run multiple tests for a single property in parallel, using the available cores. Moreover, if a counterexample is found, the run-time can also shrink the test case in parallel, implementing a parallel search for a locally minimal counterexample.
Our experimental results show a 3--9$\times$ speed-up for testing QuickCheck properties on a variety of heavy-weight benchmark problems. We also evaluate two different shrinking strategies; deterministic shrinking, which guarantees to produce the same minimal test case as standard sequential shrinking, and greedy shrinking, which does not have this guarantee but still produces a locally minimal test case, and is faster in practice.

[22]  arXiv:2404.16063 [pdf, other]
Title: Chronological Outlooks of Globe Illustrated with Web-Based Visualization
Comments: 4 pages, 10 figures
Subjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)

Developing visualizations with comprehensive annotations is crucial for research and educational purposes. We've been experimenting with various visualization tools like Plotly, Plotly.js, and D3.js to analyze global trends, focusing on areas such as Global Terrorism, the Global Air Quality Index (AQI), and Global Population dynamics. These visualizations help us gain insights into complex research topics, facilitating better understanding and analysis. We've created a single web homepage that links to three distinct visualization web pages, each exploring specific topics in depth. These webpages have been deployed on free cloud hosting servers such as Vercel and Render.

[23]  arXiv:2404.16064 [pdf, ps, other]
Title: Transparent AI: Developing an Explainable Interface for Predicting Postoperative Complications
Comments: 32 pages, 7 figures, 4 supplement figures and 1 supplement table
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)

Given the sheer volume of surgical procedures and the significant rate of postoperative fatalities, assessing and managing surgical complications has become a critical public health concern. Existing artificial intelligence (AI) tools for risk surveillance and diagnosis often lack adequate interpretability, fairness, and reproducibility. To address this, we proposed an Explainable AI (XAI) framework designed to answer five critical questions: why, why not, how, what if, and what else, with the goal of enhancing the explainability and transparency of AI models. We incorporated various techniques such as Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), counterfactual explanations, model cards, an interactive feature manipulation interface, and the identification of similar patients to address these questions. We showcased an XAI interface prototype that adheres to this framework for predicting major postoperative complications. This initial implementation has provided valuable insights into the vast explanatory potential of our XAI framework and represents an initial step towards its clinical adoption.

[24]  arXiv:2404.16065 [pdf, other]
Title: mmWave Wearable Antenna for Interaction with VR Devices
Subjects: Human-Computer Interaction (cs.HC); Signal Processing (eess.SP)

The VR industry is one of the most promising industries for the near future, as it can provide a more immersive connection between people and the virtual world. Currently, VR devices interact with people using inconvenient controllers or cameras that perform poorly in dark environments. Interaction through millimeter-wave wearable devices has the potential to conveniently track human behavior regardless of the lighting conditions. In this study, a millimeter-wave wearable antenna was developed, opening up the possibility for more immersive interaction with VR devices. The antenna features a low loss tangent polyester fabric to minimize dielectric losses and a smooth coating to reduce losses due to rough surfaces. The antenna operates in the 24GHz ISM band, with an S11 value of -29dB at 24.15GHz.

[25]  arXiv:2404.16066 [pdf, other]
Title: Social Media Use is Predictable from App Sequences: Using LSTM and Transformer Neural Networks to Model Habitual Behavior
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

The present paper introduces a novel approach to studying social media habits through predictive modeling of sequential smartphone user behaviors. While much of the literature on media and technology habits has relied on self-report questionnaires and simple behavioral frequency measures, we examine an important yet understudied aspect of media and technology habits: their embeddedness in repetitive behavioral sequences. Leveraging Long Short-Term Memory (LSTM) and transformer neural networks, we show that (i) social media use is predictable at the within and between-person level and that (ii) there are robust individual differences in the predictability of social media use. We examine the performance of several modeling approaches, including (i) global models trained on the pooled data from all participants, (ii) idiographic person-specific models, and (iii) global models fine-tuned on person-specific data. Neither person-specific modeling nor fine-tuning on person-specific data substantially outperformed the global models, indicating that the global models were able to represent a variety of idiosyncratic behavioral patterns. Additionally, our analyses reveal that the person-level predictability of social media use is not substantially related to the frequency of smartphone use in general or the frequency of social media use, indicating that our approach captures an aspect of habits that is distinct from behavioral frequency. Implications for habit modeling and theoretical development are discussed.

[26]  arXiv:2404.16067 [pdf, other]
Title: Layout2Rendering: AI-aided Greenspace design
Comments: 14 pages,8 figures
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

In traditional human living environment landscape design, the establishment of three-dimensional models is an essential step for designers to intuitively present the spatial relationships of design elements, as well as a foundation for conducting landscape analysis on the site. Rapidly and effectively generating beautiful and realistic landscape spaces is a significant challenge faced by designers. Although generative design has been widely applied in related fields, they mostly generate three-dimensional models through the restriction of indicator parameters. However, the elements of landscape design are complex and have unique requirements, making it difficult to generate designs from the perspective of indicator limitations. To address these issues, this study proposes a park space generative design system based on deep learning technology. This system generates design plans based on the topological relationships of landscape elements, then vectorizes the plan element information, and uses Grasshopper to generate three-dimensional models while synchronously fine-tuning parameters, rapidly completing the entire process from basic site conditions to model effect analysis. Experimental results show that: (1) the system, with the aid of AI-assisted technology, can rapidly generate space green space schemes that meet the designer's perspective based on site conditions; (2) this study has vectorized and three-dimensionalized various types of landscape design elements based on semantic information; (3) the analysis and visualization module constructed in this study can perform landscape analysis on the generated three-dimensional models and produce node effect diagrams, allowing users to modify the design in real time based on the effects, thus enhancing the system's interactivity.

[27]  arXiv:2404.16068 [pdf, other]
Title: SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

While vertical thinking relies on logical and commonsense reasoning, lateral thinking requires systems to defy commonsense associations and overwrite them through unconventional thinking. Lateral thinking has been shown to be challenging for current models but has received little attention. A recent benchmark, BRAINTEASER, aims to evaluate current models' lateral thinking ability in a zero-shot setting. In this paper, we split the original benchmark to also support fine-tuning setting and present SemEval Task 9: BRAIN-TEASER(S), the first task at this competition designed to test the system's reasoning and lateral thinking ability. As a popular task, BRAINTEASER(S)'s two subtasks receive 483 team submissions from 182 participants during the competition. This paper provides a fine-grained system analysis of the competition results, together with a reflection on what this means for the ability of the systems to reason laterally. We hope that the BRAINTEASER(S) subtasks and findings in this paper can stimulate future work on lateral thinking and robust reasoning by computational models.

[28]  arXiv:2404.16069 [pdf, other]
Title: Interactive Visual Learning for Stable Diffusion
Comments: 4 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2305.03509
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Diffusion-based generative models' impressive ability to create convincing images has garnered global attention. However, their complex internal structures and operations often pose challenges for non-experts to grasp. We introduce Diffusion Explainer, the first interactive visualization tool designed to elucidate how Stable Diffusion transforms text prompts into images. It tightly integrates a visual overview of Stable Diffusion's complex components with detailed explanations of their underlying operations. This integration enables users to fluidly transition between multiple levels of abstraction through animations and interactive elements. Offering real-time hands-on experience, Diffusion Explainer allows users to adjust Stable Diffusion's hyperparameters and prompts without the need for installation or specialized hardware. Accessible via users' web browsers, Diffusion Explainer is making significant strides in democratizing AI education, fostering broader public access. More than 7,200 users spanning 113 countries have used our open-sourced tool at https://poloclub.github.io/diffusion-explainer/. A video demo is available at https://youtu.be/MbkIADZjPnA.

[29]  arXiv:2404.16070 [pdf, other]
Title: VeGAn-Tool: A Fuzzy-logic Approach for Value-based Goal Model Analysis
Journal-ref: Science of Computer Programming, 2023, vol. 230, p. 103001
Subjects: Other Computer Science (cs.OH)

Goal-oriented analysis tools are used to assess goal models and assist analysts in decision-making. We introduce the VeGAn-Tool, which prioritizes goals according to their qualitative importance for the stakeholders and propagates this information in the goal model according to the different types of relationships. The FTOPSIS technique is used to calculate the value of each intentional element by employing the fuzzified importance (importance level fuzzified and refined by a confidence level) and the impact among the related intentional elements. The result is a prioritized goal model according to the value of each intentional element from the stakeholders' point of view.

[30]  arXiv:2404.16071 [pdf, ps, other]
Title: Augmenting the Author: Exploring the Potential of AI Collaboration in Academic Writing
Comments: 5 pages, workshop paper, CHI 2024 conference GENAI
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

This workshop paper presents a critical examination of the integration of Generative AI (Gen AI) into the academic writing process, focusing on the use of AI as a collaborative tool. It contrasts the performance and interaction of two AI models, Gemini and ChatGPT, through a collaborative inquiry approach where researchers engage in facilitated sessions to design prompts that elicit specific AI responses for crafting research outlines. This case study highlights the importance of prompt design, output analysis, and recognizing the AI's limitations to ensure responsible and effective AI integration in scholarly work. Preliminary findings suggest that prompt variation significantly affects output quality and reveals distinct capabilities and constraints of each model. The paper contributes to the field of Human-Computer Interaction by exploring effective prompt strategies and providing a comparative analysis of Gen AI models, ultimately aiming to enhance AI-assisted academic writing and prompt a deeper dialogue within the HCI community.

[31]  arXiv:2404.16072 [pdf, other]
Title: Playing Board Games with the Predict Results of Beam Search Algorithm
Authors: Sergey Pastukhov
Comments: 8 pages, 4 figures
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper introduces a novel algorithm for two-player deterministic games with perfect information, which we call PROBS (Predict Results of Beam Search). Unlike existing methods that predominantly rely on Monte Carlo Tree Search (MCTS) for decision processes, our approach leverages a simpler beam search algorithm. We evaluate the performance of our algorithm across a selection of board games, where it consistently demonstrates an increased winning ratio against baseline opponents. A key result of this study is that the PROBS algorithm operates effectively, even when the beam search size is considerably smaller than the average number of turns in the game.

[32]  arXiv:2404.16074 [pdf, other]
Title: Explaining AI Decisions: Towards Achieving Human-Centered Explainability in Smart Home Environments
Comments: This is the pre-print version of our accepted paper at the 2nd World Conference on eXplainable Artificial Intelligence (xAI2024), which will be held in Valletta, Malta in 17-19 July, 2024
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)

Smart home systems are gaining popularity as homeowners strive to enhance their living and working environments while minimizing energy consumption. However, the adoption of artificial intelligence (AI)-enabled decision-making models in smart home systems faces challenges due to the complexity and black-box nature of these systems, leading to concerns about explainability, trust, transparency, accountability, and fairness. The emerging field of explainable artificial intelligence (XAI) addresses these issues by providing explanations for the models' decisions and actions. While state-of-the-art XAI methods are beneficial for AI developers and practitioners, they may not be easily understood by general users, particularly household members. This paper advocates for human-centered XAI methods, emphasizing the importance of delivering readily comprehensible explanations to enhance user satisfaction and drive the adoption of smart home systems. We review state-of-the-art XAI methods and prior studies focusing on human-centered explanations for general users in the context of smart home applications. Through experiments on two smart home application scenarios, we demonstrate that explanations generated by prominent XAI techniques might not be effective in helping users understand and make decisions. We thus argue for the necessity of a human-centric approach in representing explanations in smart home systems and highlight relevant human-computer interaction (HCI) methodologies, including user studies, prototyping, technology probes analysis, and heuristic evaluation, that can be employed to generate and present human-centered explanations to users.

[33]  arXiv:2404.16075 [pdf, other]
Title: Validating Traces of Distributed Programs Against TLA+ Specifications
Subjects: Programming Languages (cs.PL); Software Engineering (cs.SE)

TLA+ is a formal language for specifying systems, including distributed algorithms, that is supported by powerful verification tools. In this work we present a framework for relating traces of distributed programs to high-level specifications written inTLA+. The problem is reduced to a constrained model checking problem, realized using the TLC model checker. Our framework consists of an API for instrumenting Java programs in order to record traces of executions, of a collection of TLA+ operators that are used for relating those traces to specifications, and of scripts for running the model checker.Crucially, traces only contain updates to specification variables rather than full values, and it is not necessary to provide values for all variables. We have applied our approach to several distributed programs, detecting discrepancies between the specifications and the implementations in all cases. We discuss reasons for these discrepancies, how to interpret the verdict produced by TLC, and how to take into account the results of trace validation for implementation development.

[34]  arXiv:2404.16076 [pdf, other]
Title: Semantic Evolvement Enhanced Graph Autoencoder for Rumor Detection
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

Due to the rapid spread of rumors on social media, rumor detection has become an extremely important challenge. Recently, numerous rumor detection models which utilize textual information and the propagation structure of events have been proposed. However, these methods overlook the importance of semantic evolvement information of event in propagation process, which is often challenging to be truly learned in supervised training paradigms and traditional rumor detection methods. To address this issue, we propose a novel semantic evolvement enhanced Graph Autoencoder for Rumor Detection (GARD) model in this paper. The model learns semantic evolvement information of events by capturing local semantic changes and global semantic evolvement information through specific graph autoencoder and reconstruction strategies. By combining semantic evolvement information and propagation structure information, the model achieves a comprehensive understanding of event propagation and perform accurate and robust detection, while also detecting rumors earlier by capturing semantic evolvement information in the early stages. Moreover, in order to enhance the model's ability to learn the distinct patterns of rumors and non-rumors, we introduce a uniformity regularizer to further improve the model's performance. Experimental results on three public benchmark datasets confirm the superiority of our GARD method over the state-of-the-art approaches in both overall performance and early rumor detection.

[35]  arXiv:2404.16077 [pdf, other]
Title: Supercompiler Code Optimization with Zero-Shot Reinforcement Learning
Subjects: Programming Languages (cs.PL); Machine Learning (cs.LG)

Effective code optimization in compilers plays a central role in computer and software engineering. While compilers can be made to automatically search the optimization space without the need for user interventions, this is not a standard practice since the search is slow and cumbersome. Here we present CodeZero, an artificial intelligence agent trained extensively on large data to produce effective optimization strategies instantly for each program in a single trial of the agent. To overcome the huge range of possible test programs, we prepare a large dataset of training programs that emphasize quality, naturalness, and diversity. To tackle the vast space of possible optimizations, we adapt deep reinforcement learning to train the agent in a sample-efficient manner through interacting with a world model of the compiler environment. Evaluation on both benchmark suites and production-level code optimization problems demonstrates our agent's supercompiler performances and zero-shot generalization abilities, outperforming built-in optimization options designed by compiler experts. Our methodology kindles the great potential of artificial intelligence for engineering and paves the way for scaling machine learning techniques in the realm of code optimization.

[36]  arXiv:2404.16078 [pdf, other]
Title: Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective
Authors: Vaisakh Shaj
Comments: Doctoral Dissertation Preprint, Department of Computer Science, Karlsruhe Institute Of Technology, 2024
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Machines that can replicate human intelligence with type 2 reasoning capabilities should be able to reason at multiple levels of spatio-temporal abstractions and scales using internal world models. Devising formalisms to develop such internal world models, which accurately reflect the causal hierarchies inherent in the dynamics of the real world, is a critical research challenge in the domains of artificial intelligence and machine learning. This thesis identifies several limitations with the prevalent use of state space models (SSMs) as internal world models and propose two new probabilistic formalisms namely Hidden-Parameter SSMs and Multi-Time Scale SSMs to address these drawbacks. The structure of graphical models in both formalisms facilitates scalable exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time. This approach permits the development of scalable, adaptive hierarchical world models capable of representing nonstationary dynamics across multiple temporal abstractions and scales. Moreover, these probabilistic formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the stochastic nature of the real world and quantify the confidence in its predictions. The thesis also discuss how these formalisms are in line with related neuroscience literature on Bayesian brain hypothesis and predicitive processing. Our experiments on various real and simulated robots demonstrate that our formalisms can match and in many cases exceed the performance of contemporary transformer variants in making long-range future predictions. We conclude the thesis by reflecting on the limitations of our current models and suggesting directions for future research.

[37]  arXiv:2404.16109 [pdf, other]
Title: zkLLM: Zero Knowledge Proofs for Large Language Models
Comments: Accepted to ACM CCS 2024, camera-ready version under preparation. This is the author's version of the work. It is posted here for your personal use. Not for redistribution
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

The recent surge in artificial intelligence (AI), characterized by the prominence of large language models (LLMs), has ushered in fundamental transformations across the globe. However, alongside these advancements, concerns surrounding the legitimacy of LLMs have grown, posing legal challenges to their extensive applications. Compounding these concerns, the parameters of LLMs are often treated as intellectual property, restricting direct investigations.
In this study, we address a fundamental challenge within the realm of AI legislation: the need to establish the authenticity of outputs generated by LLMs. To tackle this issue, we present zkLLM, which stands as the inaugural specialized zero-knowledge proof tailored for LLMs to the best of our knowledge. Addressing the persistent challenge of non-arithmetic operations in deep learning, we introduce tlookup, a parallelized lookup argument designed for non-arithmetic tensor operations in deep learning, offering a solution with no asymptotic overhead. Furthermore, leveraging the foundation of tlookup, we introduce zkAttn, a specialized zero-knowledge proof crafted for the attention mechanism, carefully balancing considerations of running time, memory usage, and accuracy.
Empowered by our fully parallelized CUDA implementation, zkLLM emerges as a significant stride towards achieving efficient zero-knowledge verifiable computations over LLMs. Remarkably, for LLMs boasting 13 billion parameters, our approach enables the generation of a correctness proof for the entire inference process in under 15 minutes. The resulting proof, compactly sized at less than 200 kB, is designed to uphold the privacy of the model parameters, ensuring no inadvertent information leakage.

[38]  arXiv:2404.16111 [pdf, other]
Title: Forcing, Transition Algebras, and Calculi
Subjects: Logic in Computer Science (cs.LO)

We bring forward a logical system of transition algebras that enhances many-sorted first-order logic using features from dynamic logics. The sentences we consider include compositions, unions, and transitive closures of transition relations, which are treated similarly to the actions used in dynamic logics in order to define necessity and possibility operators. This leads to a higher degree of expressivity than that of many-sorted first-order logic. For example, one can finitely axiomatize both the finiteness and the reachability of models, neither of which are ordinarily possible in many-sorted first-order logic. We introduce syntactic entailment and study basic properties such as compactness and completeness, showing that the latter does not hold when standard finitary proof rules are used. Consequently, we define proof rules having both finite and countably infinite premises, and we provide conditions under which completeness can be proved. To that end, we generalize the forcing method introduced in model theory by Robinson from a single signature to a category of signatures, and we apply it to obtain a completeness result for signatures that are at most countable.

[39]  arXiv:2404.16112 [pdf, other]
Title: Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{https://github.com/badripatro/mamba360}.

[40]  arXiv:2404.16113 [pdf, other]
Title: Joint operation of a fast-charging EV hub with a stand-alone independent battery storage system under fairness considerations
Subjects: Systems and Control (eess.SY)

The need for larger-scale fast-charging electric vehicle (EV) hubs is on the rise due to the growth in EV adoption. Another area of power infrastructure growth is the proliferation of independently operated stand-alone battery storage systems (BSS), which is fueled by improvements and cost reductions in battery technology. Many possible uses of the stand-alone BSS are being explored including participation in the energy and ancillary markets, load balancing for renewable generations, and supporting large-scale load-consuming entities like hospitals. In this paper, we study a novel usage of the stand-alone BSS whereby in addition to participating in the electricity reserve market, it allows an EV hub to use a part of its storage capacity, when profitable. The hub uses the BSS storage capacity for arbitrage consequently reducing its operating cost. We formulate this joint operation as a bi-objective optimization model. We then reformulate it into a second-order cone Nash bargaining problem, the solution of which guarantees fairness to both the hub and the BSS. A sample numerical case study is formulated using actual prices of electricity and simulated data for the reserve market and EV charging demand. The Nash bargaining solution shows that both participants can benefit from the joint operation.

[41]  arXiv:2404.16115 [pdf, other]
Title: Online Personalizing White-box LLMs Generation with Neural Bandits
Comments: 7 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the personalization of open-ended text generation by white-box LLMs. Through rigorous experimentation on various tasks, we demonstrate significant performance improvements over baseline strategies. NeuralTS, in particular, leads to substantial enhancements in personalized news headline generation, achieving up to a 62.9% improvement in terms of best ROUGE scores and up to 2.76% increase in LLM-agent evaluation against the baseline.

[42]  arXiv:2404.16116 [pdf, other]
Title: Classifying Human-Generated and AI-Generated Election Claims in Social Media
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Politics is one of the most prevalent topics discussed on social media platforms, particularly during major election cycles, where users engage in conversations about candidates and electoral processes. Malicious actors may use this opportunity to disseminate misinformation to undermine trust in the electoral process. The emergence of Large Language Models (LLMs) exacerbates this issue by enabling malicious actors to generate misinformation at an unprecedented scale. Artificial intelligence (AI)-generated content is often indistinguishable from authentic user content, raising concerns about the integrity of information on social networks. In this paper, we present a novel taxonomy for characterizing election-related claims. This taxonomy provides an instrument for analyzing election-related claims, with granular categories related to jurisdiction, equipment, processes, and the nature of claims. We introduce ElectAI, a novel benchmark dataset that consists of 9,900 tweets, each labeled as human- or AI-generated. For AI-generated tweets, the specific LLM variant that produced them is specified. We annotated a subset of 1,550 tweets using the proposed taxonomy to capture the characteristics of election-related claims. We explored the capabilities of LLMs in extracting the taxonomy attributes and trained various machine learning models using ElectAI to distinguish between human- and AI-generated posts and identify the specific LLM variant.

[43]  arXiv:2404.16117 [pdf, other]
Title: Cybersecurity Assessment of the Polar Bluetooth Low Energy Heart-rate Sensor
Authors: Smone Soderi
Comments: Body Area Networks: Smart IoT and Big Data for Intelligent Health Management, 2019
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Networking and Internet Architecture (cs.NI)

Wireless communications among wearable and implantable devices implement the information exchange around the human body. Wireless body area network (WBAN) technology enables non-invasive applications in our daily lives. Wireless connected devices improve the quality of many services, and they make procedures easier. On the other hand, they open up large attack surfaces and introduces potential security vulnerabilities. Bluetooth low energy (BLE) is a low-power protocol widely used in wireless personal area networks (WPANs). This paper analyzes the security vulnerabilities of a BLE heart-rate sensor. By observing the received signal strength indicator (RSSI) variations, it is possible to detect anomalies in the BLE connection. The case-study shows that an attacker can easily intercept and manipulate the data transmitted between the mobile app and the BLE device. With this research, the author would raise awareness about the security of the heart-rate information that we can receive from our wireless body sensors.

[44]  arXiv:2404.16118 [pdf, other]
Title: Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models
Comments: 12 pages
Subjects: Cryptography and Security (cs.CR)

With the increasing prevalence of security incidents, the adoption of deception-based defense strategies has become pivotal in cyber security. This work addresses the challenge of scalability in designing honeytokens, a key component of such defense mechanisms. The manual creation of honeytokens is a tedious task. Although automated generators exists, they often lack versatility, being specialized for specific types of honeytokens, and heavily rely on suitable training datasets. To overcome these limitations, this work systematically investigates the approach of utilizing Large Language Models (LLMs) to create a variety of honeytokens. Out of the seven different honeytoken types created in this work, such as configuration files, databases, and log files, two were used to evaluate the optimal prompt. The generation of robots.txt files and honeywords was used to systematically test 210 different prompt structures, based on 16 prompt building blocks. Furthermore, all honeytokens were tested across different state-of-the-art LLMs to assess the varying performance of different models. Prompts performing optimally on one LLMs do not necessarily generalize well to another. Honeywords generated by GPT-3.5 were found to be less distinguishable from real passwords compared to previous methods of automated honeyword generation. Overall, the findings of this work demonstrate that generic LLMs are capable of creating a wide array of honeytokens using the presented prompt structures.

[45]  arXiv:2404.16120 [pdf, other]
Title: Securing Hybrid Wireless Body Area Networks (HyWBAN): Advancements in Semantic Communications and Jamming Techniques
Journal-ref: Digital Health and Wireless Solutions, 2024
Subjects: Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

This paper explores novel strategies to strengthen the security of Hybrid Wireless Body Area Networks (HyWBANs), essential in smart healthcare and Internet of Things (IoT) applications. Recognizing the vulnerability of HyWBAN to sophisticated cyber-attacks, we propose an innovative combination of semantic communications and jamming receivers. This dual-layered security mechanism protects against unauthorized access and data breaches, particularly in scenarios involving in-body to on-body communication channels. We conduct comprehensive laboratory measurements to understand hybrid (radio and optical) communication propagation through biological tissues and utilize these insights to refine a dataset for training a Deep Learning (DL) model. These models, in turn, generate semantic concepts linked to cryptographic keys for enhanced data confidentiality and integrity using a jamming receiver. The proposed model demonstrates a significant reduction in energy consumption compared to traditional cryptographic methods, like Elliptic Curve Diffie-Hellman (ECDH), especially when supplemented with jamming. Our approach addresses the primary security concerns and sets the baseline for future secure biomedical communication systems advancements.

[46]  arXiv:2404.16122 [pdf, ps, other]
Title: Generalized Optimization Modulo Theories
Subjects: Logic in Computer Science (cs.LO)

Optimization Modulo Theories (OMT) has emerged as an important extension of the highly successful Satisfiability Modulo Theories (SMT) paradigm. The OMT problem requires solving an SMT problem with the restriction that the solution must be optimal with respect to a given objective function. We introduce a generalization of the OMT problem where, in particular, objective functions can range over partially ordered sets. We provide a formalization of and an abstract calculus for the generalized OMT problem and prove their key correctness properties. Generalized OMT extends previous work on OMT in several ways. First, in contrast to many current OMT solvers, our calculus is theory-agnostic, enabling the optimization of queries over any theories or combinations thereof. Second, our formalization unifies both single- and multi-objective optimization problems, allowing us to study them both in a single framework and facilitating the use of objective functions that are not supported by existing OMT approaches. Finally, our calculus is sufficiently general to fully capture a wide variety of current OMT approaches (each of which can be realized as a specific strategy for rule application in the calculus) and to support the exploration of new search strategies. Much like the original abstract DPLL(T) calculus for SMT, our Generalized OMT calculus is designed to establish a theoretical foundation for understanding and research and to serve as a framework for studying variations of and extensions to existing OMT methodologies.

[47]  arXiv:2404.16123 [pdf, other]
Title: FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Comments: Conference paper at CVPR 2024. 6 pages, 8 figures. Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset. These results have been based on pruning commonly used image-caption datasets collected from the web -- datasets that are known to harbor harmful social biases that may then be codified in trained models. In this work, we evaluate how deduplication affects the prevalence of these biases in the resulting trained models and introduce an easy-to-implement modification to the recent SemDeDup algorithm that can reduce the negative effects that we observe. When examining CLIP-style models trained on deduplicated variants of LAION-400M, we find our proposed FairDeDup algorithm consistently leads to improved fairness metrics over SemDeDup on the FairFace and FACET datasets while maintaining zero-shot performance on CLIP benchmarks.

[48]  arXiv:2404.16124 [pdf, ps, other]
Title: A Situated-Infrastructuring of WhatsApp for Business in India
Authors: Ankolika De
Comments: 7 pages, Accepted to ACM CHI 2024 as an Extended Abstract. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI'24)
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)

WhatsApp has become a pivotal communication tool in India, transcending cultural boundaries and deeply integrating into the nation's digital landscape. Meta's introduction of WhatsApp for Business aligns seamlessly with the platform's popularity, offering businesses a crucial tool. However, the monetization plans pose challenges, particularly for smaller businesses, in balancing revenue goals with accessibility. This study, employing discourse analysis, examines Meta's infrastructuring of WhatsApp in India, emphasizing the dynamic interplay of technological, social, and cultural dimensions. Consequently, it highlights potential power differences caused by the deployment of WhatsApp for Business followed by its gradual but significant modifications, encouraging scholars to investigate the implications and ethics of rapid technological changes, particularly for marginalized users.

[49]  arXiv:2404.16127 [pdf, other]
Title: Comparison of static and dynamic random forests models for EHR data in the presence of competing risks: predicting central line-associated bloodstream infection
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations. We included data from 27478 admissions to the University Hospitals Leuven, covering 30862 catheter episodes (970 CLABSI, 1466 deaths and 28426 discharges) to build static and dynamic RF models for binary (CLABSI vs no CLABSI), multinomial (CLABSI, discharge, death or no event), survival (time to CLABSI) and competing risks (time to CLABSI, discharge or death) outcomes to predict the 7-day CLABSI risk. We evaluated model performance across 100 train/test splits. Performance of binary, multinomial and competing risks models was similar: AUROC was 0.74 for baseline predictions, rose to 0.78 for predictions at day 5 in the catheter episode, and decreased thereafter. Survival models overestimated the risk of CLABSI (E:O ratios between 1.2 and 1.6), and had AUROCs about 0.01 lower than other models. Binary and multinomial models had lowest computation times. Models including multiple outcome events (multinomial and competing risks) display a different internal structure compared to binary and survival models. In the absence of censoring, complex modelling choices do not considerably improve the predictive performance compared to a binary model for CLABSI prediction in our studied settings. Survival models censoring the competing events at their time of occurrence should be avoided.

[50]  arXiv:2404.16130 [pdf, other]
Title: From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely-related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG leads to substantial improvements over a na\"ive RAG baseline for both the comprehensiveness and diversity of generated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is forthcoming at https://aka.ms/graphrag.

[51]  arXiv:2404.16131 [pdf, other]
Title: Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Cluster deletion is an NP-hard graph clustering objective with applications in computational biology and social network analysis, where the goal is to delete a minimum number of edges to partition a graph into cliques. We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3. Moreover, we show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a vertex of maximum degree in an auxiliary graph and forming a cluster around it. One of these algorithms relies on solving a linear program. Our final contribution is to design a new and purely combinatorial approach for doing so that is far more scalable in theory and practice.

[52]  arXiv:2404.16133 [pdf, ps, other]
Title: Quantitative Characterization of Retinal Features in Translated OCTA
Comments: The article has been revised and edited
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and a 2D OCTA image translation model. The study utilizes a public dataset of 500 patients, divided into subsets based on resolution and disease status, to validate the quality of TR-OCTA images. The validation employs several quality and quantitative metrics to compare the translated images with ground truth OCTAs (GT-OCTA). We then quantitatively characterize vascular features generated in TR-OCTAs with GT-OCTAs to assess the feasibility of using TR-OCTA for objective disease diagnosis. Result: TR-OCTAs showed high image quality in both 3 and 6 mm datasets (high-resolution, moderate structural similarity and contrast quality compared to GT-OCTAs). There were slight discrepancies in vascular metrics, especially in diseased patients. Blood vessel features like tortuosity and vessel perimeter index showed a better trend compared to density features which are affected by local vascular distortions. Conclusion: This study presents a promising solution to the limitations of OCTA adoption in clinical practice by using vascular features from TR-OCTA for disease detection. Translation relevance: This study has the potential to significantly enhance the diagnostic process for retinal diseases by making detailed vascular imaging more widely available and reducing dependency on costly OCTA equipment.

[53]  arXiv:2404.16134 [pdf, ps, other]
Title: Power Failure Cascade Prediction using Graph Neural Networks
Comments: 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). Oct. 31, 2023. See implementations at this https URL
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

We consider the problem of predicting power failure cascades due to branch failures. We propose a flow-free model based on graph neural networks that predicts grid states at every generation of a cascade process given an initial contingency and power injection values. We train the proposed model using a cascade sequence data pool generated from simulations. We then evaluate our model at various levels of granularity. We present several error metrics that gauge the model's ability to predict the failure size, the final grid state, and the failure time steps of each branch within the cascade. We benchmark the graph neural network model against influence models. We show that, in addition to being generic over randomly scaled power injection values, the graph neural network model outperforms multiple influence models that are built specifically for their corresponding loading profiles. Finally, we show that the proposed model reduces the computational time by almost two orders of magnitude.

[54]  arXiv:2404.16136 [pdf, other]
Title: 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement
Comments: Accepted at 6th Workshop and Competition on Affective Behavior Analysis in-the-wild - CVPR 2024 Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the field of 3D Human Pose Estimation (HPE), accurately estimating human pose, especially in scenarios with occlusions, is a significant challenge. This work identifies and addresses a gap in the current state of the art in 3D HPE concerning the scarcity of data and strategies for handling occlusions. We introduce our novel BlendMimic3D dataset, designed to mimic real-world situations where occlusions occur for seamless integration in 3D HPE algorithms. Additionally, we propose a 3D pose refinement block, employing a Graph Convolutional Network (GCN) to enhance pose representation through a graph model. This GCN block acts as a plug-and-play solution, adaptable to various 3D HPE frameworks without requiring retraining them. By training the GCN with occluded data from BlendMimic3D, we demonstrate significant improvements in resolving occluded poses, with comparable results for non-occluded ones. Project web page is available at https://blendmimic3d.github.io/BlendMimic3D/.

[55]  arXiv:2404.16137 [pdf, ps, other]
Title: Learned Pulse Shaping Design for PAPR Reduction in DFT-s-OFDM
Comments: 5 pages, under review
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

High peak-to-average power ratio (PAPR) is one of the main factors limiting cell coverage for cellular systems, especially in the uplink direction. Discrete Fourier transform spread orthogonal frequency-domain multiplexing (DFT-s-OFDM) with spectrally-extended frequency-domain spectrum shaping (FDSS) is one of the efficient techniques deployed to lower the PAPR of the uplink waveforms. In this work, we propose a machine learning-based framework to determine the FDSS filter, optimizing a tradeoff between the symbol error rate (SER), the PAPR, and the spectral flatness requirements. Our end-to-end optimization framework considers multiple important design constraints, including the Nyquist zero-ISI (inter-symbol interference) condition. The numerical results show that learned FDSS filters lower the PAPR compared to conventional baselines, with minimal SER degradation. Tuning the parameters of the optimization also helps us understand the fundamental limitations and characteristics of the FDSS filters for PAPR reduction.

[56]  arXiv:2404.16138 [pdf, other]
Title: Logic Dynamic Movement Primitives for Long-horizon Manipulation Tasks in Dynamic Environments
Comments: Submitted to IEEE RA-L for potential publication
Subjects: Robotics (cs.RO)

Learning from Demonstration (LfD) stands as an efficient framework for imparting human-like skills to robots. Nevertheless, designing an LfD framework capable of seamlessly imitating, generalizing, and reacting to disturbances for long-horizon manipulation tasks in dynamic environments remains a challenge. To tackle this challenge, we present Logic Dynamic Movement Primitives (Logic-DMP), which combines Task and Motion Planning (TAMP) with an optimal control formulation of DMP, allowing us to incorporate motion-level via-point specifications and to handle task-level variations or disturbances in dynamic environments. We conduct a comparative analysis of our proposed approach against several baselines, evaluating its generalization ability and reactivity across three long-horizon manipulation tasks. Our experiment demonstrates the fast generalization and reactivity of Logic-DMP for handling task-level variants and disturbances in long-horizon manipulation tasks.

[57]  arXiv:2404.16139 [pdf, other]
Title: A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World Challenges
Comments: 8 pages, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

This survey analyzes intermediate fusion methods in collaborative perception for autonomous driving, categorized by real-world challenges. We examine various methods, detailing their features and the evaluation metrics they employ. The focus is on addressing challenges like transmission efficiency, localization errors, communication disruptions, and heterogeneity. Moreover, we explore strategies to counter adversarial attacks and defenses, as well as approaches to adapt to domain shifts. The objective is to present an overview of how intermediate fusion methods effectively meet these diverse challenges, highlighting their role in advancing the field of collaborative perception in autonomous driving.

[58]  arXiv:2404.16143 [pdf, ps, other]
Title: A Two-Phase Infinite/Finite Low-Level Memory Model
Subjects: Programming Languages (cs.PL)

This paper provides a novel approach to reconciling complex low-level memory model features, such as pointer--integer casts, with desired refinements that are needed to justify the correctness of program transformations. The idea is to use a "two-phased" memory model, one with and unbounded memory and corresponding unbounded integer type, and one with a finite memory; the connection between the two levels is made explicit by our notion of refinement that handles out-of-memory behaviors. This approach allows for more optimizations to be performed and establishes a clear boundary between the idealized semantics of a program and the implementation of that program on finite hardware.
To demonstrate the utility of this idea in practice, we instantiate the two-phase memory model in the context of Zakowski et al.'s VIR semantics, yielding infinite and finite memory models of LLVM IR, including low-level features like undef and bitcast. Both the infinite and finite models, which act as specifications, can provably be refined to executable reference interpreters. The semantics justify optimizations, such as dead-alloca-elimination, that were previously impossible or difficult to prove correct.

[59]  arXiv:2404.16147 [pdf, other]
Title: Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model
Subjects: Robotics (cs.RO)

The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving scenarios. By inputting descriptive texts of driving conditions and specifying the criticality metric thresholds, the framework efficiently searches for desired scenarios and converts them into ASAM OpenSCENARIO and IPG CarMaker text files. This methodology streamlines the scenario extraction process and enhances efficiency. Simulations are executed to validate the efficiency of the approach. The framework is presented based on a user-friendly web app and is accessible via the following link: https://github.com/ftgTUGraz/Chat2Scenario.

[60]  arXiv:2404.16150 [pdf, other]
Title: A Rollup Comparison Framework
Comments: 15 pages, 1 figure. Accepted at ChainScience 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Rollups are a popular blockchain paradigm where one blockchain network is anchored to a different blockchain network, typically though smart contracts and data commitments. The rollup executes transactions on its own network and periodically publishes them along with the state root of the rollup network. The state root is determined to be final by a protocol, often enforced by smart contracts on the anchoring blockchain, which may let the state roots be challenged or verify an accompanying validity proof. While this core functionality is universal to existing rollups, these systems have introduced unique features as they vie for users and market dominance. In this paper, we aim to classify ways in which these rollups differ in order to establish a common ground of understanding. We explore various dimensions in which these system can differ: familiarity, finality time, modularity, and maturity. The result is a framework that can be used to understand and compare the properties of rollups.

[61]  arXiv:2404.16152 [pdf, ps, other]
Title: Rethinking Grant-Free Protocol in mMTC
Comments: Submitted to IEEE for possible publication
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

This paper revisits the identity detection problem under the current grant-free protocol in massive machine-type communications (mMTC) by asking the following question: for stable identity detection performance, is it enough to permit active devices to transmit preambles without any handshaking with the base station (BS)? Specifically, in the current grant-free protocol, the BS blindly allocates a fixed length of preamble to devices for identity detection as it lacks the prior information on the number of active devices $K$. However, in practice, $K$ varies dynamically over time, resulting in degraded identity detection performance especially when $K$ is large. Consequently, the current grant-free protocol fails to ensure stable identity detection performance. To address this issue, we propose a two-stage communication protocol which consists of estimation of $K$ in Phase I and detection of identities of active devices in Phase II. The preamble length for identity detection in Phase II is dynamically allocated based on the estimated $K$ in Phase I through a table lookup manner such that the identity detection performance could always be better than a predefined threshold. In addition, we design an algorithm for estimating $K$ in Phase I, and exploit the estimated $K$ to reduce the computational complexity of the identity detector in Phase II. Numerical results demonstrate the effectiveness of the proposed two-stage communication protocol and algorithms.

[62]  arXiv:2404.16154 [pdf, other]
Title: A Comparative Analysis of Adversarial Robustness for Quantum and Classical Machine Learning Models
Comments: submitted to IEEE QCE24
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Quantum Physics (quant-ph)

Quantum machine learning (QML) continues to be an area of tremendous interest from research and industry. While QML models have been shown to be vulnerable to adversarial attacks much in the same manner as classical machine learning models, it is still largely unknown how to compare adversarial attacks on quantum versus classical models. In this paper, we show how to systematically investigate the similarities and differences in adversarial robustness of classical and quantum models using transfer attacks, perturbation patterns and Lipschitz bounds. More specifically, we focus on classification tasks on a handcrafted dataset that allows quantitative analysis for feature attribution. This enables us to get insight, both theoretically and experimentally, on the robustness of classification networks. We start by comparing typical QML model architectures such as amplitude and re-upload encoding circuits with variational parameters to a classical ConvNet architecture. Next, we introduce a classical approximation of QML circuits (originally obtained with Random Fourier Features sampling but adapted in this work to fit a trainable encoding) and evaluate this model, denoted Fourier network, in comparison to other architectures. Our findings show that this Fourier network can be seen as a "middle ground" on the quantum-classical boundary. While adversarial attacks successfully transfer across this boundary in both directions, we also show that regularization helps quantum networks to be more robust, which has direct impact on Lipschitz bounds and transfer attacks.

[63]  arXiv:2404.16155 [pdf, other]
Title: Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Machine Learning (cs.LG)

We introduce an assessment procedure for interactive segmentation models. Based on concepts from Bayesian Experimental Design, the procedure measures a model's understanding of point prompts and their correspondence with the desired segmentation mask. We show that Oracle Dice index measurements are insensitive or even misleading in measuring this property. We demonstrate the use of the proposed procedure on three interactive segmentation models and subsets of two large image segmentation datasets.

[64]  arXiv:2404.16158 [pdf, other]
Title: The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms
Comments: 33 pages, 24 figures
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

FPGAs are rarely mentioned when discussing the implementation of large machine learning applications, such as Large Language Models (LLMs), in the data center. There has been much evidence showing that single FPGAs can be competitive with GPUs in performance for some computations, especially for low latency, and often much more efficient when power is considered. This suggests that there is merit to exploring the use of multiple FPGAs for large machine learning applications. The challenge with using multiple FPGAs is that there is no commonly-accepted flow for developing and deploying multi-FPGA applications, i.e., there are no tools to describe a large application, map it to multiple FPGAs and then deploy the application on a multi-FPGA platform. In this paper, we explore the feasibility of implementing large transformers using multiple FPGAs by developing a scalable multi-FPGA platform and some tools to map large applications to the platform. We validate our approach by designing an efficient multi-FPGA version of the I-BERT transformer and implement one encoder using six FPGAs as a working proof-of-concept to show that our platform and tools work. Based on our proof-of-concept prototype and the estimations of performance using the latest FPGAs compared to GPUs, we conclude that there can be a place for FPGAs in the world of large machine learning applications. We demonstrate a promising first step that shows that with the right infrastructure and tools it is reasonable to continue to explore the possible benefits of using FPGAs for applications such as LLMs.

[65]  arXiv:2404.16159 [pdf, other]
Title: AFU: Actor-Free critic Updates in off-policy RL for continuous control
Comments: 19 pages, 13 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper presents AFU, an off-policy deep RL algorithm addressing in a new way the challenging "max-Q problem" in Q-learning for continuous action spaces, with a solution based on regression and conditional gradient scaling. AFU has an actor but its critic updates are entirely independent from it. As a consequence, the actor can be chosen freely. In the initial version, AFU-alpha, we employ the same stochastic actor as in Soft Actor-Critic (SAC), but we then study a simple failure mode of SAC and show how AFU can be modified to make actor updates less likely to become trapped in local optima, resulting in a second version of the algorithm, AFU-beta. Experimental results demonstrate the sample efficiency of both versions of AFU, marking it as the first model-free off-policy algorithm competitive with state-of-the-art actor-critic methods while departing from the actor-critic perspective.

[66]  arXiv:2404.16160 [pdf, other]
Title: Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant
Comments: Accepted at ICASSP 2024 EIHRC
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) have demonstrated impressive generalization capabilities on specific tasks with human-written instruction data. However, the limited quantity, diversity, and professional expertise of such instruction data raise concerns about the performance of LLMs in psychotherapy tasks when provided with domain-specific instructions. To address this, we firstly propose Domain-Specific Assistant Instructions based on AlexanderStreet therapy, and secondly, we use an adaption fine-tuning method and retrieval augmented generation method to improve pre-trained LLMs. Through quantitative evaluation of linguistic quality using automatic and human evaluation, we observe that pre-trained LLMs on Psychotherapy Assistant Instructions outperform state-of-the-art LLMs response baselines. Our Assistant-Instruction approach offers a half-annotation method to align pre-trained LLMs with instructions and provide pre-trained LLMs with more psychotherapy knowledge.

[67]  arXiv:2404.16162 [pdf, other]
Title: Scaling Lifelong Multi-Agent Path Finding to More Realistic Settings: Research Challenges and Opportunities
Comments: Accepted to Symposium on Combinatorial Search (SoCS), 2024
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI)

Multi-Agent Path Finding (MAPF) is the problem of moving multiple agents from starts to goals without collisions. Lifelong MAPF (LMAPF) extends MAPF by continuously assigning new goals to agents. We present our winning approach to the 2023 League of Robot Runners LMAPF competition, which leads us to several interesting research challenges and future directions. In this paper, we outline three main research challenges. The first challenge is to search for high-quality LMAPF solutions within a limited planning time (e.g., 1s per step) for a large number of agents (e.g., 10,000) or extremely high agent density (e.g., 97.7%). We present future directions such as developing more competitive rule-based and anytime MAPF algorithms and parallelizing state-of-the-art MAPF algorithms. The second challenge is to alleviate congestion and the effect of myopic behaviors in LMAPF algorithms. We present future directions, such as developing moving guidance and traffic rules to reduce congestion, incorporating future prediction and real-time search, and determining the optimal agent number. The third challenge is to bridge the gaps between the LMAPF models used in the literature and real-world applications. We present future directions, such as dealing with more realistic kinodynamic models, execution uncertainty, and evolving systems.

[68]  arXiv:2404.16163 [pdf, other]
Title: The Trembling-Hand Problem for LTLf Planning
Comments: The paper is accepted by IJCAI 2024
Subjects: Robotics (cs.RO); Formal Languages and Automata Theory (cs.FL)

Consider an agent acting to achieve its temporal goal, but with a "trembling hand". In this case, the agent may mistakenly instruct, with a certain (typically small) probability, actions that are not intended due to faults or imprecision in its action selection mechanism, thereby leading to possible goal failure. We study the trembling-hand problem in the context of reasoning about actions and planning for temporally extended goals expressed in Linear Temporal Logic on finite traces (LTLf), where we want to synthesize a strategy (aka plan) that maximizes the probability of satisfying the LTLf goal in spite of the trembling hand. We consider both deterministic and nondeterministic (adversarial) domains. We propose solution techniques for both cases by relying respectively on Markov Decision Processes and on Markov Decision Processes with Set-valued Transitions with LTLf objectives, where the set-valued probabilistic transitions capture both the nondeterminism from the environment and the possible action instruction errors from the agent. We formally show the correctness of our solution techniques and demonstrate their effectiveness experimentally through a proof-of-concept implementation.

[69]  arXiv:2404.16164 [pdf, other]
Title: Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Large language models (LLMs) have shown remarkable performance on a variety of NLP tasks, and are being rapidly adopted in a wide range of use cases. It is therefore of vital importance to holistically evaluate the factuality of their generated outputs, as hallucinations remain a challenging issue.
In this work, we focus on assessing LLMs' ability to recall factual knowledge learned from pretraining, and the factors that affect this ability. To that end, we construct FACT-BENCH, a representative benchmark covering 20 domains, 134 property types, 3 answer types, and different knowledge popularity levels. We benchmark 31 models from 10 model families and provide a holistic assessment of their strengths and weaknesses. We observe that instruction-tuning hurts knowledge recall, as pretraining-only models consistently outperform their instruction-tuned counterparts, and positive effects of model scaling, as larger models outperform smaller ones for all model families. However, the best performance from GPT-4 still represents a large gap with the upper-bound. We additionally study the role of in-context exemplars using counterfactual demonstrations, which lead to significant degradation of factual knowledge recall for large models. By further decoupling model known and unknown knowledge, we find the degradation is attributed to exemplars that contradict a model's known knowledge, as well as the number of such exemplars. Lastly, we fine-tune LLaMA-7B in different settings of known and unknown knowledge. In particular, fine-tuning on a model's known knowledge is beneficial, and consistently outperforms fine-tuning on unknown and mixed knowledge. We will make our benchmark publicly available.

[70]  arXiv:2404.16168 [pdf, other]
Title: The Over-Certainty Phenomenon in Modern UDA Algorithms
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

When neural networks are confronted with unfamiliar data that deviate from their training set, this signifies a domain shift. While these networks output predictions on their inputs, they typically fail to account for their level of familiarity with these novel observations. This challenge becomes even more pronounced in resource-constrained settings, such as embedded systems or edge devices. To address such challenges, we aim to recalibrate a neural network's decision boundaries in relation to its cognizance of the data it observes, introducing an approach we coin as certainty distillation. While prevailing works navigate unsupervised domain adaptation (UDA) with the goal of curtailing model entropy, they unintentionally birth models that grapple with calibration inaccuracies - a dilemma we term the over-certainty phenomenon. In this paper, we probe the drawbacks of this traditional learning model. As a solution to the issue, we propose a UDA algorithm that not only augments accuracy but also assures model calibration, all while maintaining suitability for environments with limited computational resources.

[71]  arXiv:2404.16169 [pdf, other]
Title: Interpretable Machine Learning Models for Predicting the Next Targets of Activist Funds
Authors: Minwu Kim
Comments: 15 pages
Subjects: Computational Engineering, Finance, and Science (cs.CE); Statistical Finance (q-fin.ST)

This work develops a predictive model to identify potential targets of activist investment funds, which strategically acquire significant corporate stakes to drive operational and strategic improvements and enhance shareholder value. Predicting these targets is crucial for companies to mitigate intervention risks, for activists to select optimal targets, and for investors to capitalize on associated stock price gains. Our analysis utilizes data from the Russell 3000 index from 2016 to 2022. We tested 123 variations of models using different data imputation, oversampling, and machine learning methods, achieving a top AUC-ROC of 0.782. This demonstrates the model's effectiveness in identifying likely targets of activist funds. We applied the Shapley value method to determine the most influential factors in a company's susceptibility to activist investment. This interpretative approach provides clear insights into the driving forces behind activist targeting. Our model offers stakeholders a strategic tool for proactive corporate governance and investment strategy, enhancing understanding of the dynamics of activist investing.

[72]  arXiv:2404.16171 [pdf, other]
Title: An analysis of the effects of sharing research data, code, and preprints on citations
Subjects: Digital Libraries (cs.DL)

Calls to make scientific research more open have gained traction with a range of societal stakeholders. Open Science practices include but are not limited to the early sharing of results via preprints and openly sharing outputs such as data and code to make research more reproducible and extensible. Existing evidence shows that adopting Open Science practices has effects in several domains. In this study, we investigate whether adopting one or more Open Science practices leads to significantly higher citations for an associated publication, which is one form of academic impact. We use a novel dataset known as Open Science Indicators, produced by PLOS and DataSeer, which includes all PLOS publications from 2018 to 2023 as well as a comparison group sampled from the PMC Open Access Subset. In total, we analyze circa 122'000 publications. We calculate publication and author-level citation indicators and use a broad set of control variables to isolate the effect of Open Science Indicators on received citations. We show that Open Science practices are adopted to different degrees across scientific disciplines. We find that the early release of a publication as a preprint correlates with a significant positive citation advantage of about 20.2% on average. We also find that sharing data in an online repository correlates with a smaller yet still positive citation advantage of 4.3% on average. However, we do not find a significant citation advantage for sharing code. Further research is needed on additional or alternative measures of impact beyond citations. Our results are likely to be of interest to researchers, as well as publishers, research funders, and policymakers.

[73]  arXiv:2404.16174 [pdf, other]
Title: MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models
Comments: 14 pages, 6 figures, ACM FAccT 2024
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The recent prevalence of publicly accessible, large medical imaging datasets has led to a proliferation of artificial intelligence (AI) models for cardiovascular image classification and analysis. At the same time, the potentially significant impacts of these models have motivated the development of a range of explainable AI (XAI) methods that aim to explain model predictions given certain image inputs. However, many of these methods are not developed or evaluated with domain experts, and explanations are not contextualized in terms of medical expertise or domain knowledge. In this paper, we propose a novel framework and python library, MiMICRI, that provides domain-centered counterfactual explanations of cardiovascular image classification models. MiMICRI helps users interactively select and replace segments of medical images that correspond to morphological structures. From the counterfactuals generated, users can then assess the influence of each segment on model predictions, and validate the model against known medical facts. We evaluate this library with two medical experts. Our evaluation demonstrates that a domain-centered XAI approach can enhance the interpretability of model explanations, and help experts reason about models in terms of relevant domain knowledge. However, concerns were also surfaced about the clinical plausibility of the counterfactuals generated. We conclude with a discussion on the generalizability and trustworthiness of the MiMICRI framework, as well as the implications of our findings on the development of domain-centered XAI methods for model interpretability in healthcare contexts.

[74]  arXiv:2404.16176 [pdf, other]
Title: Unweighted Layered Graph Traversal
Subjects: Data Structures and Algorithms (cs.DS); Metric Geometry (math.MG)

Introduced by Papadimitriou and Yannakakis in 1989, layered graph traversal is an important problem in online algorithms and mobile computing that has been studied for several decades, and which now is essentially resolved in its original formulation. In this paper, we demonstrate that what appears to be an innocuous modification of the problem actually leads to a drastic (exponential) reduction of the competitive ratio. Specifically, we present an algorithm that is $O(\log^2 w)$-competitive for traversing unweighted layered graphs of width $w$. Our technique is based on a simple entropic regularizer, which evolves as the agent progresses in the layered graph. Our algorithm is randomized and simply maintains that at all layers, the probability distribution of the position of the mobile agent maximizes the entropic regularizer.

[75]  arXiv:2404.16177 [pdf, other]
Title: Advancing Recommender Systems by mitigating Shilling attacks
Comments: Published in IEEE, Proceedings of 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT)
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Considering the premise that the number of products offered grow in an exponential fashion and the amount of data that a user can assimilate before making a decision is relatively small, recommender systems help in categorizing content according to user preferences. Collaborative filtering is a widely used method for computing recommendations due to its good performance. But, this method makes the system vulnerable to attacks which try to bias the recommendations. These attacks, known as 'shilling attacks' are performed to push an item or nuke an item in the system. This paper proposes an algorithm to detect such shilling profiles in the system accurately and also study the effects of such profiles on the recommendations.

[76]  arXiv:2404.16179 [pdf, ps, other]
Title: S2DEVFMAP: Self-Supervised Learning Framework with Dual Ensemble Voting Fusion for Maximizing Anomaly Prediction in Timeseries
Subjects: Machine Learning (cs.LG)

Anomaly detection plays a crucial role in industrial settings, particularly in maintaining the reliability and optimal performance of cooling systems. Traditional anomaly detection methods often face challenges in handling diverse data characteristics and variations in noise levels, resulting in limited effectiveness. And yet traditional anomaly detection often relies on application of single models. This work proposes a novel, robust approach using five heterogeneous independent models combined with a dual ensemble fusion of voting techniques. Diverse models capture various system behaviors, while the fusion strategy maximizes detection effectiveness and minimizes false alarms. Each base autoencoder model learns a unique representation of the data, leveraging their complementary strengths to improve anomaly detection performance. To increase the effectiveness and reliability of final anomaly prediction, dual ensemble technique is applied. This approach outperforms in maximizing the coverage of identifying anomalies. Experimental results on a real-world dataset of industrial cooling system data demonstrate the effectiveness of the proposed approach. This approach can be extended to other industrial applications where anomaly detection is critical for ensuring system reliability and preventing potential malfunctions.

[77]  arXiv:2404.16180 [pdf, ps, other]
Title: Blind Federated Learning without initial model
Journal-ref: Journal of Big Data volume 11, Article number: 56 (2024)
Subjects: Machine Learning (cs.LG)

Federated learning is an emerging machine learning approach that allows the construction of a model between several participants who hold their own private data. This method is secure and privacy-preserving, suitable for training a machine learning model using sensitive data from different sources, such as hospitals. In this paper, the authors propose two innovative methodologies for Particle Swarm Optimisation-based federated learning of Fuzzy Cognitive Maps in a privacy-preserving way. In addition, one relevant contribution this research includes is the lack of an initial model in the federated learning process, making it effectively blind. This proposal is tested with several open datasets, improving both accuracy and precision.

[78]  arXiv:2404.16183 [pdf, ps, other]
Title: ABCD: Trust enhanced Attention based Convolutional Autoencoder for Risk Assessment
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Anomaly detection in industrial systems is crucial for preventing equipment failures, ensuring risk identification, and maintaining overall system efficiency. Traditional monitoring methods often rely on fixed thresholds and empirical rules, which may not be sensitive enough to detect subtle changes in system health and predict impending failures. To address this limitation, this paper proposes, a novel Attention-based convolutional autoencoder (ABCD) for risk detection and map the risk value derive to the maintenance planning. ABCD learns the normal behavior of conductivity from historical data of a real-world industrial cooling system and reconstructs the input data, identifying anomalies that deviate from the expected patterns. The framework also employs calibration techniques to ensure the reliability of its predictions. Evaluation results demonstrate that with the attention mechanism in ABCD a 57.4% increase in performance and a reduction of false alarms by 9.37% is seen compared to without attention. The approach can effectively detect risks, the risk priority rank mapped to maintenance, providing valuable insights for cooling system designers and service personnel. Calibration error of 0.03% indicates that the model is well-calibrated and enhances model's trustworthiness, enabling informed decisions about maintenance strategies

[79]  arXiv:2404.16188 [pdf, other]
Title: Pearls from Pebbles: Improved Confidence Functions for Auto-labeling
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the \emph{optimal} TBAL confidence function. We develop a tractable version of the framework to obtain \texttt{Colander} (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of our method \texttt{Colander} and compare it against methods designed for calibration. \texttt{Colander} achieves up to 60\% improvements on coverage over the baselines while maintaining auto-labeling error below $5\%$ and using the same amount of labeled data as the baselines.

[80]  arXiv:2404.16189 [pdf, other]
Title: Structure Preserving PINN for Solving Time Dependent PDEs with Periodic Boundary
Subjects: Numerical Analysis (math.NA)

We present a structure preserving PINN for solving a series of time dependent PDEs with periodic boundary. Our method can incorporate the periodic boundary condition as the natural output of any deep neural net, hence significantly improving the training accuracy of baseline PINN. Together with mini-batching and other PINN variants (SA-PINN, RBA-PINN, etc.), our structure preserving PINN can even handle stiff PDEs for modeling a wide range of convection-diffusion and reaction-diffusion processes. We demonstrate the effectiveness of our PINNs on various PDEs from Allen Cahn, Gray Scott to nonlinear Schrodinger.

[81]  arXiv:2404.16192 [pdf, other]
Title: Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
Comments: Clinical NLP @ NAACL 2024
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e.g., medical. We propose a medical vision-language model that integrates large vision and language models adapted for the medical domain. This model goes through three stages of parameter-efficient training using three separate biomedical and radiology multi-modal visual and text datasets. The proposed model achieves state-of-the-art performance on the SLAKE 1.0 medical VQA (MedVQA) dataset with an overall accuracy of 87.5% and demonstrates strong performance on another MedVQA dataset, VQA-RAD, achieving an overall accuracy of 73.2%.

[82]  arXiv:2404.16193 [pdf, other]
Title: Improving Multi-label Recognition using Class Co-Occurrence Probabilities
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)

Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.

[83]  arXiv:2404.16195 [pdf, other]
Title: A Game-Theoretic Analysis of Auditing Differentially Private Algorithms with Epistemically Disparate Herd
Subjects: Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)

Privacy-preserving AI algorithms are widely adopted in various domains, but the lack of transparency might pose accountability issues. While auditing algorithms can address this issue, machine-based audit approaches are often costly and time-consuming. Herd audit, on the other hand, offers an alternative solution by harnessing collective intelligence. Nevertheless, the presence of epistemic disparity among auditors, resulting in varying levels of expertise and access to knowledge, may impact audit performance. An effective herd audit will establish a credible accountability threat for algorithm developers, incentivizing them to uphold their claims. In this study, our objective is to develop a systematic framework that examines the impact of herd audits on algorithm developers using the Stackelberg game approach. The optimal strategy for auditors emphasizes the importance of easy access to relevant information, as it increases the auditors' confidence in the audit process. Similarly, the optimal choice for developers suggests that herd audit is viable when auditors face lower costs in acquiring knowledge. By enhancing transparency and accountability, herd audit contributes to the responsible development of privacy-preserving algorithms.

[84]  arXiv:2404.16198 [pdf, ps, other]
Title: Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model
Subjects: Computation and Language (cs.CL)

Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts.

[85]  arXiv:2404.16205 [pdf, other]
Title: AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results
Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.

[86]  arXiv:2404.16206 [pdf, other]
Title: Knowledge Graph Completion using Structural and Textual Embeddings
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Knowledge Graphs (KGs) are widely employed in artificial intelligence applications, such as question-answering and recommendation systems. However, KGs are frequently found to be incomplete. While much of the existing literature focuses on predicting missing nodes for given incomplete KG triples, there remains an opportunity to complete KGs by exploring relations between existing nodes, a task known as relation prediction. In this study, we propose a relations prediction model that harnesses both textual and structural information within KGs. Our approach integrates walks-based embeddings with language model embeddings to effectively represent nodes. We demonstrate that our model achieves competitive results in the relation prediction task when evaluated on a widely used dataset.

[87]  arXiv:2404.16208 [pdf, other]
Title: GPU-RANC: A CUDA Accelerated Simulation Framework for Neuromorphic Architectures
Comments: Accepted for publication in Neuro-Inspired Computational Elements (NICE) Workshop 2024
Subjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI)

Open-source simulation tools play a crucial role for neuromorphic application engineers and hardware architects to investigate performance bottlenecks and explore design optimizations before committing to silicon. Reconfigurable Architecture for Neuromorphic Computing (RANC) is one such tool that offers ability to execute pre-trained Spiking Neural Network (SNN) models within a unified ecosystem through both software-based simulation and FPGA-based emulation. RANC has been utilized by the community with its flexible and highly parameterized design to study implementation bottlenecks, tune architectural parameters or modify neuron behavior based on application insights and study the trade space on hardware performance and network accuracy. In designing architectures for use in neuromorphic computing, there are an incredibly large number of configuration parameters such as number and precision of weights per neuron, neuron and axon counts per core, network topology, and neuron behavior. To accelerate such studies and provide users with a streamlined productive design space exploration, in this paper we introduce the GPU-based implementation of RANC. We summarize our parallelization approach and quantify the speedup gains achieved with GPU-based tick-accurate simulations across various use cases. We demonstrate up to 780 times speedup compared to serial version of the RANC simulator based on a 512 neuromorphic core MNIST inference application. We believe that the RANC ecosystem now provides a much more feasible avenue in the research of exploring different optimizations for accelerating SNNs and performing richer studies by enabling rapid convergence to optimized neuromorphic architectures.

[88]  arXiv:2404.16210 [pdf, other]
Title: Efficient Data Management for IPFS dApps
Comments: 19 pages, 9 figures
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)

Inefficient data management has been the Achilles heel of blockchain-based decentralized applications (dApps). An off-chain storage layer, which lies between the application and the blockchain layers, can improve space efficiency and data availability with erasure codes and decentralized maintenance. This paper presents two fundamental components of such storage layer designed and implemented for the IPFS network. The IPFS Community is a component built on top of the IPFS network that encodes and decodes data before uploading to the network. Since data is encoded with alpha entanglement codes, the solution requires less storage space than the native IPFS solution which replicates data by pinning content with the IPFS Cluster. To detect and repair failures in a timely manner, we introduce the monitoring and repair component. This novel component is activated by any node and distributes the load of repairs among various nodes. These two components are implemented as pluggable modules, and can, therefore, be easily migrated to other distributed file systems by adjusting the connector component.

[89]  arXiv:2404.16212 [pdf, other]
Title: An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape
Comments: Accepted to IEEE S&P 2024; 19 pages, 10 figures
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.

[90]  arXiv:2404.16213 [pdf, ps, other]
Title: MAG$π$!: The Role of Replication in Typing Failure-Prone Communication
Subjects: Programming Languages (cs.PL)

MAG$\pi$ is a Multiparty, Asynchronous and Generalised $\pi$-calculus that introduces timeouts into session types as a means of reasoning about failure-prone communication. Its type system guarantees that all possible message-loss is handled by timeout branches. In this work, we argue that the previous is unnecessarily strict. We present MAG$\pi$!, an extension serving as the first introduction of replication into Multiparty Session Types (MPST). Replication is a standard $\pi$-calculus construct used to model infinitely available servers. We lift this construct to type-level, and show that it simplifies specification of distributed client-server interactions. We prove properties relevant to generalised MPST: subject reduction, session fidelity and process property verification.

[91]  arXiv:2404.16216 [pdf, other]
Title: ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)

An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment, for any given source/receiver location. Traditional methods for constructing acoustic models involve expensive and time-consuming collection of large quantities of acoustic data at dense spatial locations in the space, or rely on privileged knowledge of scene geometry to intelligently select acoustic data sampling locations. We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment in which a mobile agent equipped with visual and acoustic sensors jointly constructs the environment acoustic model and the occupancy map on-the-fly. We introduce ActiveRIR, a reinforcement learning (RL) policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions, yielding a high quality acoustic model of the environment from a minimal set of acoustic samples. We train our policy with a novel RL reward based on information gain in the environment acoustic model. Evaluating on diverse unseen indoor environments from a state-of-the-art acoustic simulation platform, ActiveRIR outperforms an array of methods--both traditional navigation agents based on spatial novelty and visual exploration as well as existing state-of-the-art methods.

[92]  arXiv:2404.16217 [pdf, other]
Title: Fault-Tolerant Bounded Flow Preservers
Comments: 12 pages, 2 figures
Subjects: Data Structures and Algorithms (cs.DS)

Given a directed graph $G = (V, E)$ with $n$ vertices, $m$ edges and a designated source vertex $s\in V$, we consider the question of finding a sparse subgraph $H$ of $G$ that preserves the flow from $s$ up to a given threshold $\lambda$ even after failure of $k$ edges. We refer to such subgraphs as $(\lambda,k)$-fault-tolerant bounded-flow-preserver ($(\lambda,k)$-FT-BFP). Formally, for any $F \subseteq E$ of at most $k$ edges and any $v\in V$, the $(s, v)$-max-flow in $H \setminus F$ is equal to $(s, v)$-max-flow in $G \setminus F$, if the latter is bounded by $\lambda$, and at least $\lambda$ otherwise. Our contributions are summarized as follows:
1. We provide a polynomial time algorithm that given any graph $G$ constructs a $(\lambda,k)$-FT-BFP of $G$ with at most $\lambda 2^kn$ edges.
2. We also prove a matching lower bound of $\Omega(\lambda 2^kn)$ on the size of $(\lambda,k)$-FT-BFP. In particular, we show that for every $\lambda,k,n\geq 1$, there exists an $n$-vertex directed graph whose optimal $(\lambda,k)$-FT-BFP contains $\Omega(\min\{2^k\lambda n,n^2\})$ edges.
3. Furthermore, we show that the problem of computing approximate $(\lambda,k)$-FT-BFP is NP-hard for any approximation ratio that is better than $O(\log(\lambda^{-1} n))$.

[93]  arXiv:2404.16218 [pdf, other]
Title: Efficient NAS with FaDE on Hierarchical Spaces
Journal-ref: Advances in Intelligent Data Analysis XXII. IDA 2024. Lecture Notes in Computer Science, vol 14642. Springer, Cham
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Neural architecture search (NAS) is a challenging problem. Hierarchical search spaces allow for cheap evaluations of neural network sub modules to serve as surrogate for architecture evaluations. Yet, sometimes the hierarchy is too restrictive or the surrogate fails to generalize. We present FaDE which uses differentiable architecture search to obtain relative performance predictions on finite regions of a hierarchical NAS space. The relative nature of these ranks calls for a memory-less, batch-wise outer search algorithm for which we use an evolutionary algorithm with pseudo-gradient descent. FaDE is especially suited on deep hierarchical, respectively multi-cell search spaces, which it can explore by linear instead of exponential cost and therefore eliminates the need for a proxy search space.
Our experiments show that firstly, FaDE-ranks on finite regions of the search space correlate with corresponding architecture performances and secondly, the ranks can empower a pseudo-gradient evolutionary search on the complete neural architecture search space.

[94]  arXiv:2404.16219 [pdf, other]
Title: Can Increasing the Hit Ratio Hurt Cache Throughput?
Subjects: Performance (cs.PF)

Software caches are an intrinsic component of almost every computer system. Consequently, caching algorithms, particularly eviction policies, are the topic of many papers. Almost all these prior papers evaluate the caching algorithm based on its hit ratio, namely the fraction of requests that are found in the cache, as opposed to disk. The hit ratio is viewed as a proxy for traditional performance metrics like system throughput or response time. Intuitively it makes sense that higher hit ratio should lead to higher throughput (and lower response time), since more requests are found in the cache (low access time) as opposed to the disk (high access time).
This paper challenges this intuition. We show that increasing the hit ratio can actually hurt the throughput (and response time) for many caching algorithms. Our investigation follows a three-pronged approach involving (i) queueing modeling and analysis, (ii) implementation and measurement, and (iii) simulation to validate the accuracy of the queueing model. We also show that the phenomenon of throughput decreasing at higher hit ratios is likely to be more pronounced in future systems, where the trend is towards faster disks and higher numbers of cores per CPU.

[95]  arXiv:2404.16220 [pdf, other]
Title: When does a bent concatenation not belong to the completed Maiorana-McFarland class?
Comments: This is the authors' version of the camera-ready version to be presented at the 2024 IEEE International Symposium on Information Theory (ISIT 2024)
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Discrete Mathematics (cs.DM); Combinatorics (math.CO)

Every Boolean bent function $f$ can be written either as a concatenation $f=f_1||f_2$ of two complementary semi-bent functions $f_1,f_2$; or as a concatenation $f=f_1||f_2||f_3||f_4$ of four Boolean functions $f_1,f_2,f_3,f_4$, all of which are simultaneously bent, semi-bent, or 5-valued spectra-functions. In this context, it is essential to ask: When does a bent concatenation $f$ (not) belong to the completed Maiorana-McFarland class $\mathcal{M}^\#$? In this article, we answer this question completely by providing a full characterization of the structure of $\mathcal{M}$-subspaces for the concatenation of the form $f=f_1||f_2$ and $f=f_1||f_2||f_3||f_4$, which allows us to specify the necessary and sufficient conditions so that $f$ is outside $\mathcal{M}^\#$. Based on these conditions, we propose several explicit design methods of specifying bent functions outside $\mathcal{M}^\#$ in the special case when $f=g||h||g||(h+1)$, where $g$ and $h$ are bent functions.

[96]  arXiv:2404.16221 [pdf, other]
Title: NeRF-XL: Scaling NeRFs with Multiple GPUs
Comments: Webpage: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Graphics (cs.GR)

We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improvements in reconstruction quality as additional computational resources (GPUs) are used in training. NeRF-XL remedies these issues and enables the training and rendering of NeRFs with an arbitrary number of parameters by simply using more hardware. At the core of our method lies a novel distributed training and rendering formulation, which is mathematically equivalent to the classic single-GPU case and minimizes communication between GPUs. By unlocking NeRFs with arbitrarily large parameter counts, our approach is the first to reveal multi-GPU scaling laws for NeRFs, showing improvements in reconstruction quality with larger parameter counts and speed improvements with more GPUs. We demonstrate the effectiveness of NeRF-XL on a wide variety of datasets, including the largest open-source dataset to date, MatrixCity, containing 258K images covering a 25km^2 city area.

[97]  arXiv:2404.16222 [pdf, other]
Title: Step Differences in Instructional Video
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Comparing a user video to a reference how-to video is a key requirement for AR/VR technology delivering personalized assistance tailored to the user's progress. However, current approaches for language-based assistance can only answer questions about a single video. We propose an approach that first automatically generates large amounts of visual instruction tuning data involving pairs of videos from HowTo100M by leveraging existing step annotations and accompanying narrations, and then trains a video-conditioned language model to jointly reason across multiple raw videos. Our model achieves state-of-the-art performance at identifying differences between video pairs and ranking videos based on the severity of these differences, and shows promising ability to perform general reasoning over multiple videos.

[98]  arXiv:2404.16223 [pdf, other]
Title: Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey
Comments: CVPR 2024 - NTIRE Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.

[99]  arXiv:2404.16224 [pdf, ps, other]
Title: Tractable Conjunctive Queries over Static and Dynamic Relations
Subjects: Databases (cs.DB)

We investigate the evaluation of conjunctive queries over static and dynamic relations. While static relations are given as input and do not change, dynamic relations are subject to inserts and deletes.
We characterise syntactically three classes of queries that admit constant update time and constant enumeration delay. We call such queries tractable. Depending on the class, the preprocessing time is linear, polynomial, or exponential (under data complexity, so the query size is constant).
To decide whether a query is tractable, it does not suffice to analyse separately the sub-query over the static relations and the sub-query over the dynamic relations. Instead, we need to take the interaction between the static and the dynamic relations into account. Even when the sub-query over the dynamic relations is not tractable, the overall query can become tractable if the dynamic relations are sufficiently constrained by the static ones.

[100]  arXiv:2404.16226 [pdf, ps, other]
Title: Computational analysis of the language of pain: a systematic review
Comments: 38 pages, 16 tables, 2 figures, systematic review
Subjects: Computation and Language (cs.CL)

Objectives: This study aims to systematically review the literature on the computational processing of the language of pain, whether generated by patients or physicians, identifying current trends and challenges. Methods: Following the PRISMA guidelines, a comprehensive literature search was conducted to select relevant studies on the computational processing of the language of pain and answer pre-defined research questions. Data extraction and synthesis were performed to categorize selected studies according to their primary purpose and outcome, patient and pain population, textual data, computational methodology, and outcome targets. Results: Physician-generated language of pain, specifically from clinical notes, was the most used data. Tasks included patient diagnosis and triaging, identification of pain mentions, treatment response prediction, biomedical entity extraction, correlation of linguistic features with clinical states, and lexico-semantic analysis of pain narratives. Only one study included previous linguistic knowledge on pain utterances in their experimental setup. Most studies targeted their outcomes for physicians, either directly as clinical tools or as indirect knowledge. The least targeted stage of clinical pain care was self-management, in which patients are most involved. The least studied dimensions of pain were affective and sociocultural. Only two studies measured how physician performance on clinical tasks improved with the inclusion of the proposed algorithm. Discussion: This study found that future research should focus on analyzing patient-generated language of pain, developing patient-centered resources for self-management and patient-empowerment, exploring affective and sociocultural aspects of pain, and measuring improvements in physician performance when aided by the proposed tools.

[101]  arXiv:2404.16231 [pdf, ps, other]
Title: A proof theory of (omega-)context-free languages, via non-wellfounded proofs
Comments: 24 pages, 5 figures
Subjects: Logic in Computer Science (cs.LO); Logic (math.LO)

We investigate the proof theory of regular expressions with fixed points, construed as a notation for (omega-)context-free grammars. Starting with a hypersequential system for regular expressions due to Das and Pous, we define its extension by least fixed points and prove soundness and completeness of its non-wellfounded proofs for the standard language model. From here we apply proof-theoretic techniques to recover an infinitary axiomatisation of the resulting equational theory, complete for inclusions of context-free languages. Finally, we extend our syntax by greatest fixed points, now computing omega-context-free languages. We show the soundness and completeness of the corresponding system using a mixture of proof-theoretic and game-theoretic techniques.

[102]  arXiv:2404.16232 [pdf, other]
Title: SECO: Secure Inference With Model Splitting Across Multi-Server Hierarchy
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

In the context of prediction-as-a-service, concerns about the privacy of the data and the model have been brought up and tackled via secure inference protocols. These protocols are built up by using single or multiple cryptographic tools designed under a variety of different security assumptions.
In this paper, we introduce SECO, a secure inference protocol that enables a user holding an input data vector and multiple server nodes deployed with a split neural network model to collaboratively compute the prediction, without compromising either party's data privacy. We extend prior work on secure inference that requires the entire neural network model to be located on a single server node, to a multi-server hierarchy, where the user communicates to a gateway server node, which in turn communicates to remote server nodes. The inference task is split across the server nodes and must be performed over an encrypted copy of the data vector.
We adopt multiparty homomorphic encryption and multiparty garbled circuit schemes, making the system secure against dishonest majority of semi-honest servers as well as protecting the partial model structure from the user. We evaluate SECO on multiple models, achieving the reduction of computation and communication cost for the user, making the protocol applicable to user's devices with limited resources.

[103]  arXiv:2404.16233 [pdf, other]
Title: AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundational models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks showcases AutoMM's superior performance in basic classification and regression tasks compared to existing AutoML tools, while also demonstrating competitive results in advanced tasks, aligning with specialized toolboxes designed for such purposes.

[104]  arXiv:2404.16240 [pdf, other]
Title: A communication protocol based on NK boolean networks for coordinating collective action
Authors: Yori Ong
Comments: 13 pages, 4 figures
Subjects: Social and Information Networks (cs.SI); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

In this paper, I describe a digital social communication protocol (Gridt) based on Kauffman's NK boolean networks. The main assertion is that a communication network with this topology supports infinitely scalable self-organization of collective action without requiring hierarchy or central control. The paper presents the functionality of this protocol and substantiates the following propositions about its function and implications: (1) Communication via NK boolean networks facilitates coordination on collective action games for any variable number of users, and justifies the assumption that the game's payoff structure is common knowledge; (2) Use of this protocol increases its users' transfer empowerment, a form of intrinsic motivation that motivates coordinated action independent of the task or outcome; (3) Communication via this network can be considered 'cheap talk' and benefits the strategy of players with aligned interests, but not of players with conflicting interests; (4) Absence of significant barriers for its realization warrants a timely and continuing discussion on the ethics and implications of this technology; (5) Full realization of the technology's potential calls for a free-to-use service with maximal transparency of design and associated economic incentives.

[105]  arXiv:2404.16241 [pdf, other]
Title: Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization
Subjects: Cryptography and Security (cs.CR)

This study develops a novel framework for privacy-preserving data analytics, addressing the critical challenge of balancing data utility with privacy concerns. We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction while masking sensitive attributes and an Expectation Maximization (EM) approach optimized for structured data privacy. Applied to datasets such as Modified MNIST and CelebrityA, our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy. Our experimental results confirm that these approaches achieve superior privacy protection and retain high utility, making them viable for practical applications where both aspects are crucial. The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types and establishing new benchmarks for utility and confidentiality in data analytics.

[106]  arXiv:2404.16243 [pdf, ps, other]
Title: muRelBench: MicroBenchmarks for Zonotope Domains
Comments: NEAT paper for SAS 2024
Subjects: Logic in Computer Science (cs.LO); Software Engineering (cs.SE)

We present \texttt{muRelBench}, a suite of synthetic benchmarks for weakly-relational abstract domains and their operations. For example, the benchmarks can support experimental evaluations of proposed algorithms such as domain closure.

[107]  arXiv:2404.16244 [pdf, other]
Title: The Ethics of Advanced AI Assistants
Subjects: Computers and Society (cs.CY)

This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.

[108]  arXiv:2404.16248 [pdf, other]
Title: URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Linking a claim to grounded references is a critical ability to fulfill human demands for authentic and reliable information. Current studies are limited to specific tasks like information retrieval or semantic matching, where the claim-reference relationships are unique and fixed, while the referential knowledge linking (RKL) in real-world can be much more diverse and complex. In this paper, we propose universal referential knowledge linking (URL), which aims to resolve diversified referential knowledge linking tasks by one unified model. To this end, we propose a LLM-driven task-instructed representation compression, as well as a multi-view learning approach, in order to effectively adapt the instruction following and semantic understanding abilities of LLMs to referential knowledge linking. Furthermore, we also construct a new benchmark to evaluate ability of models on referential knowledge linking tasks across different scenarios. Experiments demonstrate that universal RKL is challenging for existing approaches, while the proposed framework can effectively resolve the task across various scenarios, and therefore outperforms previous approaches by a large margin.

[109]  arXiv:2404.16250 [pdf, other]
Title: Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs
Comments: Georgetown University Round Table (GURT) 2023
Subjects: Computation and Language (cs.CL)

Searching dependency graphs and manipulating them can be a time consuming and challenging task to get right. We document Semgrex, a system for searching dependency graphs, and introduce Ssurgeon, a system for manipulating the output of Semgrex. The compact language used by these systems allows for easy command line or API processing of dependencies. Additionally, integration with publicly released toolkits in Java and Python allows for searching text relations and attributes over natural text.

[110]  arXiv:2404.16251 [pdf, ps, other]
Title: Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research.

[111]  arXiv:2404.16255 [pdf, other]
Title: Enhancing Privacy in Face Analytics Using Fully Homomorphic Encryption
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Modern face recognition systems utilize deep neural networks to extract salient features from a face. These features denote embeddings in latent space and are often stored as templates in a face recognition system. These embeddings are susceptible to data leakage and, in some cases, can even be used to reconstruct the original face image. To prevent compromising identities, template protection schemes are commonly employed. However, these schemes may still not prevent the leakage of soft biometric information such as age, gender and race. To alleviate this issue, we propose a novel technique that combines Fully Homomorphic Encryption (FHE) with an existing template protection scheme known as PolyProtect. We show that the embeddings can be compressed and encrypted using FHE and transformed into a secure PolyProtect template using polynomial transformation, for additional protection. We demonstrate the efficacy of the proposed approach through extensive experiments on multiple datasets. Our proposed approach ensures irreversibility and unlinkability, effectively preventing the leakage of soft biometric attributes from face embeddings without compromising recognition accuracy.

[112]  arXiv:2404.16256 [pdf, other]
Title: Probabilistic Tracker Management Policies for Low-Cost and Scalable Rowhammer Mitigation
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)

This paper focuses on mitigating DRAM Rowhammer attacks. In recent years, solutions like TRR have been deployed in DDR4 DRAM to track aggressor rows and then issue a mitigative action by refreshing neighboring victim rows. Unfortunately, such in-DRAM solutions are resource-constrained (only able to provision few tens of counters to track aggressor rows) and are prone to thrashing based attacks, that have been used to fool them. Secure alternatives for in-DRAM trackers require tens of thousands of counters.
In this work, we demonstrate secure and scalable rowhammer mitigation using resource-constrained trackers. Our key idea is to manage such trackers with probabilistic management policies (PROTEAS). PROTEAS includes component policies like request-stream sampling and random evictions which enable thrash-resistance for resource-constrained trackers. We show that PROTEAS can secure small in-DRAM trackers (with 16 counters per DRAM bank) even when Rowhammer thresholds drop to 500 while incurring less than 3% slowdown. Moreover, we show that PROTEAS significantly outperforms a recent similar probabilistic proposal from Samsung (called DSAC) while achieving 11X - 19X the resilience against Rowhammer.

[113]  arXiv:2404.16257 [pdf, other]
Title: Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Comments: 19 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a novel MT pipeline that considers the intra-data relation in implementing MT for training data. In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation. We introduce a Catalyst Statement (CS) to enhance the intra-data relation, and Indicator Token (IT) to assist the decomposition of a translated sequence into its respective data components. Through our approach, we have achieved a considerable improvement in translation quality itself, along with its effectiveness as training data. Compared with the conventional approach that translates each data component separately, our method yields better training data that enhances the performance of the trained model by 2.690 points for the web page ranking (WPR) task, and 0.845 for the question generation (QG) task in the XGLUE benchmark.

[114]  arXiv:2404.16259 [pdf, other]
Title: An Experiment with Electric Guitar Signals for Exploring the Virtuosity based on the Entropy of Music
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

We analyze the concept of virtuosity as a collective attribute in music and its relationship with the entropy based on an experiment that compares two sets of digital signals played by composer-performer electric guitarists. Based on an interdisciplinary approach related to the complex systems, we computed the spectrum of signals, identified statistical distributions that best describe them, and measured the Shannon entropy to establish their diversity. Findings suggested that virtuosity might be related to a range of entropy values that identify levels of diversity of the frequency components of audio signals. Despite the presence of different values of entropy in the two sets of signals, they are statistically similar. Therefore, entropy values can be interpreted as levels of virtuosity in music.

[115]  arXiv:2404.16260 [pdf, other]
Title: OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search
Comments: 8 pages, 5 figures, to be published as an oral paper in TheWebConf Industry Track 2024
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In this paper, we present OmniSearchSage, a versatile and scalable system for understanding search queries, pins, and products for Pinterest search. We jointly learn a unified query embedding coupled with pin and product embeddings, leading to an improvement of $>8\%$ relevance, $>7\%$ engagement, and $>5\%$ ads CTR in Pinterest's production search system. The main contributors to these gains are improved content understanding, better multi-task learning, and real-time serving. We enrich our entity representations using diverse text derived from image captions from a generative LLM, historical engagement, and user-curated boards. Our multitask learning setup produces a single search query embedding in the same space as pin and product embeddings and compatible with pre-existing pin and product embeddings. We show the value of each feature through ablation studies, and show the effectiveness of a unified model compared to standalone counterparts. Finally, we share how these embeddings have been deployed across the Pinterest search stack, from retrieval to ranking, scaling to serve $300k$ requests per second at low latency. Our implementation of this work is available at https://github.com/pinterest/atg-research/tree/main/omnisearchsage.

[116]  arXiv:2404.16262 [pdf, other]
Title: Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains
Comments: To appear at NAACL 2024 Findings
Subjects: Computation and Language (cs.CL)

People often answer yes-no questions without explicitly saying yes, no, or similar polar keywords. Figuring out the meaning of indirect answers is challenging, even for large language models. In this paper, we investigate this problem working with dialogues from multiple domains. We present new benchmarks in three diverse domains: movie scripts, tennis interviews, and airline customer service. We present an approach grounded on distant supervision and blended training to quickly adapt to a new dialogue domain. Experimental results show that our approach is never detrimental and yields F1 improvements as high as 11-34%.

[117]  arXiv:2404.16266 [pdf, other]
Title: A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation
Comments: 8 pages, 16 figures, GECCO 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

As one of the emerging challenges in Automated Machine Learning, the Hardware-aware Neural Architecture Search (HW-NAS) tasks can be treated as black-box multi-objective optimization problems (MOPs). An important application of HW-NAS is real-time semantic segmentation, which plays a pivotal role in autonomous driving scenarios. The HW-NAS for real-time semantic segmentation inherently needs to balance multiple optimization objectives, including model accuracy, inference speed, and hardware-specific considerations. Despite its importance, benchmarks have yet to be developed to frame such a challenging task as multi-objective optimization. To bridge the gap, we introduce a tailored streamline to transform the task of HW-NAS for real-time semantic segmentation into standard MOPs. Building upon the streamline, we present a benchmark test suite, CitySeg/MOP, comprising fifteen MOPs derived from the Cityscapes dataset. The CitySeg/MOP test suite is integrated into the EvoXBench platform to provide seamless interfaces with various programming languages (e.g., Python and MATLAB) for instant fitness evaluations. We comprehensively assessed the CitySeg/MOP test suite on various multi-objective evolutionary algorithms, showcasing its versatility and practicality. Source codes are available at https://github.com/EMI-Group/evoxbench.

[118]  arXiv:2404.16267 [pdf, ps, other]
Title: Dynamic PageRank: Algorithms and Lower Bounds
Subjects: Data Structures and Algorithms (cs.DS)

We consider the PageRank problem in the dynamic setting, where the goal is to explicitly maintain an approximate PageRank vector $\pi \in \mathbb{R}^n$ for a graph under a sequence of edge insertions and deletions. Our main result is a complete characterization of the complexity of dynamic PageRank maintenance for both multiplicative and additive ($L_1$) approximations.
First, we establish matching lower and upper bounds for maintaining additive approximate PageRank in both incremental and decremental settings. In particular, we demonstrate that in the worst-case $(1/\alpha)^{\Theta(\log \log n)}$ update time is necessary and sufficient for this problem, where $\alpha$ is the desired additive approximation. On the other hand, we demonstrate that the commonly employed ForwardPush approach performs substantially worse than this optimal runtime. Specifically, we show that ForwardPush requires $\Omega(n^{1-\delta})$ time per update on average, for any $\delta > 0$, even in the incremental setting.
For multiplicative approximations, however, we demonstrate that the situation is significantly more challenging. Specifically, we prove that any algorithm that explicitly maintains a constant factor multiplicative approximation of the PageRank vector of a directed graph must have amortized update time $\Omega(n^{1-\delta})$, for any $\delta > 0$, even in the incremental setting, thereby resolving a 13-year old open question of Bahmani et al.~(VLDB 2010). This sharply contrasts with the undirected setting, where we show that $\rm{poly}\ \log n$ update time is feasible, even in the fully dynamic setting under oblivious adversary.

[119]  arXiv:2404.16268 [pdf, other]
Title: Lacunarity Pooling Layers for Plant Image Classification using Texture Analysis
Comments: 9 pages, 7 figures, accepted at 2024 IEEE/CVF Computer Vision and Pattern Recognition Vision for Agriculture Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pooling layers (e.g., max and average) may overlook important information encoded in the spatial arrangement of pixel intensity and/or feature values. We propose a novel lacunarity pooling layer that aims to capture the spatial heterogeneity of the feature maps by evaluating the variability within local windows. The layer operates at multiple scales, allowing the network to adaptively learn hierarchical features. The lacunarity pooling layer can be seamlessly integrated into any artificial neural network architecture. Experimental results demonstrate the layer's effectiveness in capturing intricate spatial patterns, leading to improved feature extraction capabilities. The proposed approach holds promise in various domains, especially in agricultural image analysis tasks. This work contributes to the evolving landscape of artificial neural network architectures by introducing a novel pooling layer that enriches the representation of spatial features. Our code is publicly available.

[120]  arXiv:2404.16271 [pdf, ps, other]
Title: True random number generation using metastable 1T' molybdenum ditelluride
Subjects: Cryptography and Security (cs.CR); Materials Science (cond-mat.mtrl-sci)

True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizations and statistical analysis of the characteristics of the conductance noise suggest that the noise arises from the volatility of the stochastic polarization of the underlying ferroelectric dipoles in the 1T' MoTe2. Further, as proved in our experiments and indicated by our Monte Carlo simulation, the ferroelectric dipole polarization is a reliable entropy source with the stochastic polarization persistent and stable over time. Exploiting the conductance noise, we achieve the generation of true random numbers and demonstrate their use in common cryptographic applications, for example, password generation and data encryption. Besides, particularly, we show a privacy safeguarding approach to sensitive data that can be critical for the cryptography of neural networks. We believe our work will bring insights into the understanding of the metastable 1T' MoTe2 and, more importantly, underpin its great potential in secure cryptography.

[121]  arXiv:2404.16274 [pdf, other]
Title: Velocity-Based Monte Carlo Fluids
Subjects: Graphics (cs.GR); Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)

We present a velocity-based Monte Carlo fluid solver that overcomes the limitations of its existing vorticity-based counterpart. Because the velocity-based formulation is more commonly used in graphics, our Monte Carlo solver can be readily extended with various techniques from the fluid simulation literature. We derive our method by solving the Navier-Stokes equations via operator splitting and designing a pointwise Monte Carlo estimator for each substep. We reformulate the projection and diffusion steps as integration problems based on the recently introduced walk-on-boundary technique [Sugimoto et al. 2023]. We transform the volume integral arising from the source term of the pressure Poisson equation into a form more amenable to practical numerical evaluation. Our resulting velocity-based formulation allows for the proper simulation of scenes that the prior vorticity-based Monte Carlo method [Rioux-Lavoie and Sugimoto et al. 2022] either simulates incorrectly or cannot support. We demonstrate that our method can easily incorporate advancements drawn from conventional non-Monte Carlo methods by showing how one can straightforwardly add buoyancy effects, divergence control capabilities, and numerical dissipation reduction methods, such as advection-reflection and PIC/FLIP methods.

[122]  arXiv:2404.16275 [pdf, ps, other]
Title: Spectrum Sharing Policy in the Asia-Pacific Region
Comments: 33 pages, 17figures
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Signal Processing (eess.SP)

In this chapter, we investigate the spectrum measurement results in Asia-Pacific region. Then the spectrum sharing policy in the Asia-Pacific region is reviewed in details, where the national projects and strategies on spectrum refarming and spectrum sharing in China, Japan, Singapore, India, Korea and Australia are investigated. Then we introduce the spectrum sharing test-bed that is developed in China, which is a cognitive radio enabled TD-LTE test-bed utilizing TVWS. This chapter provides a brief introduction of the spectrum sharing mechanism and policy of Asia-Pacific region.

[123]  arXiv:2404.16277 [pdf, ps, other]
Title: Causally Inspired Regularization Enables Domain General Representations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Given a causal graph representing the data-generating process shared across different domains/distributions, enforcing sufficient graph-implied conditional independencies can identify domain-general (non-spurious) feature representations. For the standard input-output predictive setting, we categorize the set of graphs considered in the literature into two distinct groups: (i) those in which the empirical risk minimizer across training domains gives domain-general representations and (ii) those where it does not. For the latter case (ii), we propose a novel framework with regularizations, which we demonstrate are sufficient for identifying domain-general feature representations without a priori knowledge (or proxies) of the spurious features. Empirically, our proposed method is effective for both (semi) synthetic and real-world data, outperforming other state-of-the-art methods in average and worst-domain transfer accuracy.

[124]  arXiv:2404.16280 [pdf, ps, other]
Title: An Efficient Reconstructed Differential Evolution Variant by Some of the Current State-of-the-art Strategies for Solving Single Objective Bound Constrained Problems
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Complex single-objective bounded problems are often difficult to solve. In evolutionary computation methods, since the proposal of differential evolution algorithm in 1997, it has been widely studied and developed due to its simplicity and efficiency. These developments include various adaptive strategies, operator improvements, and the introduction of other search methods. After 2014, research based on LSHADE has also been widely studied by researchers. However, although recently proposed improvement strategies have shown superiority over their previous generation's first performance, adding all new strategies may not necessarily bring the strongest performance. Therefore, we recombine some effective advances based on advanced differential evolution variants in recent years and finally determine an effective combination scheme to further promote the performance of differential evolution. In this paper, we propose a strategy recombination and reconstruction differential evolution algorithm called reconstructed differential evolution (RDE) to solve single-objective bounded optimization problems. Based on the benchmark suite of the 2024 IEEE Congress on Evolutionary Computation (CEC2024), we tested RDE and several other advanced differential evolution variants. The experimental results show that RDE has superior performance in solving complex optimization problems.

[125]  arXiv:2404.16281 [pdf, other]
Title: Timely Communications for Remote Inference
Comments: arXiv admin note: text overlap with arXiv:2208.06948
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG)

In this paper, we analyze the impact of data freshness on remote inference systems, where a pre-trained neural network infers a time-varying target (e.g., the locations of vehicles and pedestrians) based on features (e.g., video frames) observed at a sensing node (e.g., a camera). One might expect that the performance of a remote inference system degrades monotonically as the feature becomes stale. Using an information-theoretic analysis, we show that this is true if the feature and target data sequence can be closely approximated as a Markov chain, whereas it is not true if the data sequence is far from Markovian. Hence, the inference error is a function of Age of Information (AoI), where the function could be non-monotonic. To minimize the inference error in real-time, we propose a new "selection-from-buffer" model for sending the features, which is more general than the "generate-at-will" model used in earlier studies. In addition, we design low-complexity scheduling policies to improve inference performance. For single-source, single-channel systems, we provide an optimal scheduling policy. In multi-source, multi-channel systems, the scheduling problem becomes a multi-action restless multi-armed bandit problem. For this setting, we design a new scheduling policy by integrating Whittle index-based source selection and duality-based feature selection-from-buffer algorithms. This new scheduling policy is proven to be asymptotically optimal. These scheduling results hold for minimizing general AoI functions (monotonic or non-monotonic). Data-driven evaluations demonstrate the significant advantages of our proposed scheduling policies.

[126]  arXiv:2404.16282 [pdf, other]
Title: Adaptive tracking control for non-periodic reference signals under quantized observations
Subjects: Systems and Control (eess.SY)

This paper considers an adaptive tracking control problem for stochastic regression systems with multi-threshold quantized observations. Different from the existing studies for periodic reference signals, the reference signal in this paper is non-periodic. Its main difficulty is how to ensure that the designed controller satisfies the uniformly bounded and excitation conditions that guarantee the convergence of the estimation in the controller under non-periodic signal conditions. This paper designs two backward-shifted polynomials with time-varying parameters and a special projection structure, which break through periodic limitations and establish the convergence and tracking properties. To be specific, the adaptive tracking control law can achieve asymptotically optimal tracking for the non-periodic reference signal; Besides, the proposed estimation algorithm is proved to converge to the true values in almost sure and mean square sense, and the convergence speed can reach $O\left(\frac{1}{k}\right)$ under suitable conditions. Finally, the effectiveness of the proposed adaptive tracking control scheme is verified through a simulation.

[127]  arXiv:2404.16283 [pdf, other]
Title: Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services
Comments: 16 pages, 22 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

The advent of large language models (LLMs) has transformed text-based services, enabling capabilities ranging from real-time translation to AI-driven chatbots. However, existing serving systems primarily focus on optimizing server-side aggregate metrics like token generation throughput, ignoring individual user experience with streamed text. As a result, under high and/or bursty load, a significant number of users can receive unfavorable service quality or poor Quality-of-Experience (QoE). In this paper, we first formally define QoE of text streaming services, where text is delivered incrementally and interactively to users, by considering the end-to-end token delivery process throughout the entire interaction with the user. Thereafter, we propose Andes, a QoE-aware serving system that enhances user experience for LLM-enabled text streaming services. At its core, Andes strategically allocates contended GPU resources among multiple requests over time to optimize their QoE. Our evaluations demonstrate that, compared to the state-of-the-art LLM serving systems like vLLM, Andes improves the average QoE by up to 3.2$\times$ under high request rate, or alternatively, it attains up to 1.6$\times$ higher request rate while preserving high QoE.

[128]  arXiv:2404.16289 [pdf, other]
Title: Deep Joint CSI Feedback and Multiuser Precoding for MIMO OFDM Systems
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The design of precoding plays a crucial role in achieving a high downlink sum-rate in multiuser multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems. In this correspondence, we propose a deep learning based joint CSI feedback and multiuser precoding method in frequency division duplex systems, aiming at maximizing the downlink sum-rate performance in an end-to-end manner. Specifically, the eigenvectors of the CSI matrix are compressed using deep joint source-channel coding techniques. This compression method enhances the resilience of the feedback CSI information against degradation in the feedback channel. A joint multiuser precoding module and a power allocation module are designed to adjust the precoding direction and the precoding power for users based on the feedback CSI information. Experimental results demonstrate that the downlink sum-rate can be significantly improved by using the proposed method, especially in scenarios with low signal-to-noise ratio and low feedback overhead.

[129]  arXiv:2404.16292 [pdf, other]
Title: One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns
Comments: In ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2024, 21 pages
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Procedural noise is a fundamental component of computer graphics pipelines, offering a flexible way to generate textures that exhibit "natural" random variation. Many different types of noise exist, each produced by a separate algorithm. In this paper, we present a single generative model which can learn to generate multiple types of noise as well as blend between them. In addition, it is capable of producing spatially-varying noise blends despite not having access to such data for training. These features are enabled by training a denoising diffusion model using a novel combination of data augmentation and network conditioning techniques. Like procedural noise generators, the model's behavior is controllable via interpretable parameters and a source of randomness. We use our model to produce a variety of visually compelling noise textures. We also present an application of our model to improving inverse procedural material design; using our model in place of fixed-type noise nodes in a procedural material graph results in higher-fidelity material reconstructions without needing to know the type of noise in advance.

[130]  arXiv:2404.16294 [pdf, other]
Title: LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications
Comments: To appear in NAACL 2024 at the 6th Clinical Natural Language Processing Workshop
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Electronic health records (EHR) even though a boon for healthcare practitioners, are growing convoluted and longer every day. Sifting around these lengthy EHRs is taxing and becomes a cumbersome part of physician-patient interaction. Several approaches have been proposed to help alleviate this prevalent issue either via summarization or sectioning, however, only a few approaches have truly been helpful in the past. With the rise of automated methods, machine learning (ML) has shown promise in solving the task of identifying relevant sections in EHR. However, most ML methods rely on labeled data which is difficult to get in healthcare. Large language models (LLMs) on the other hand, have performed impressive feats in natural language processing (NLP), that too in a zero-shot manner, i.e. without any labeled data. To that end, we propose using LLMs to identify relevant section headers. We find that GPT-4 can effectively solve the task on both zero and few-shot settings as well as segment dramatically better than state-of-the-art methods. Additionally, we also annotate a much harder real world dataset and find that GPT-4 struggles to perform well, alluding to further research and harder benchmarks.

[131]  arXiv:2404.16296 [pdf, ps, other]
Title: Research on Splicing Image Detection Algorithms Based on Natural Image Statistical Characteristics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

With the development and widespread application of digital image processing technology, image splicing has become a common method of image manipulation, raising numerous security and legal issues. This paper introduces a new splicing image detection algorithm based on the statistical characteristics of natural images, aimed at improving the accuracy and efficiency of splicing image detection. By analyzing the limitations of traditional methods, we have developed a detection framework that integrates advanced statistical analysis techniques and machine learning methods. The algorithm has been validated using multiple public datasets, showing high accuracy in detecting spliced edges and locating tampered areas, as well as good robustness. Additionally, we explore the potential applications and challenges faced by the algorithm in real-world scenarios. This research not only provides an effective technological means for the field of image tampering detection but also offers new ideas and methods for future related research.

[132]  arXiv:2404.16297 [pdf, other]
Title: When Fuzzing Meets LLMs: Challenges and Opportunities
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

Fuzzing, a widely-used technique for bug detection, has seen advancements through Large Language Models (LLMs). Despite their potential, LLMs face specific challenges in fuzzing. In this paper, we identified five major challenges of LLM-assisted fuzzing. To support our findings, we revisited the most recent papers from top-tier conferences, confirming that these challenges are widespread. As a remedy, we propose some actionable recommendations to help improve applying LLM in Fuzzing and conduct preliminary evaluations on DBMS fuzzing. The results demonstrate that our recommendations effectively address the identified challenges.

[133]  arXiv:2404.16300 [pdf, other]
Title: Reinforcement Learning with Generative Models for Compact Support Sets
Comments: 4 pages, 2 figures. Code available at: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Foundation models contain a wealth of information from their vast number of training samples. However, most prior arts fail to extract this information in a precise and efficient way for small sample sizes. In this work, we propose a framework utilizing reinforcement learning as a control for foundation models, allowing for the granular generation of small, focused synthetic support sets to augment the performance of neural network models on real data classification tasks. We first allow a reinforcement learning agent access to a novel context based dictionary; the agent then uses this dictionary with a novel prompt structure to form and optimize prompts as inputs to generative models, receiving feedback based on a reward function combining the change in validation accuracy and entropy. A support set is formed this way over several exploration steps. Our framework produced excellent results, increasing classification accuracy by significant margins for no additional labelling or data cost.

[134]  arXiv:2404.16301 [pdf, other]
Title: Style Adaptation for Domain-adaptive Semantic Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Unsupervised Domain Adaptation (UDA) refers to the method that utilizes annotated source domain data and unlabeled target domain data to train a model capable of generalizing to the target domain data. Domain discrepancy leads to a significant decrease in the performance of general network models trained on the source domain data when applied to the target domain. We introduce a straightforward approach to mitigate the domain discrepancy, which necessitates no additional parameter calculations and seamlessly integrates with self-training-based UDA methods. Through the transfer of the target domain style to the source domain in the latent feature space, the model is trained to prioritize the target domain style during the decision-making process. We tackle the problem at both the image-level and shallow feature map level by transferring the style information from the target domain to the source domain data. As a result, we obtain a model that exhibits superior performance on the target domain. Our method yields remarkable enhancements in the state-of-the-art performance for synthetic-to-real UDA tasks. For example, our proposed method attains a noteworthy UDA performance of 76.93 mIoU on the GTA->Cityscapes dataset, representing a notable improvement of +1.03 percentage points over the previous state-of-the-art results.

[135]  arXiv:2404.16302 [pdf, other]
Title: CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions
Comments: The dataset and source code will be made publicly available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO); Image and Video Processing (eess.IV)

Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause false negatives and false positives in detection. To address this issue, we introduce a novel and challenging task, termed visible-infrared object detection under adverse weather conditions. To foster this task, we have constructed a new Severe Weather Visible-Infrared Dataset (SWVID) with diverse severe weather scenes. Furthermore, we introduce the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment detection accuracy in adverse weather conditions. Thanks to the proposed Weather Removal Diffusion Model (WRDM) and Cross-modality Fusion Mamba (CFM) modules, CFMW is able to mine more essential information of pedestrian features in cross-modality fusion, thus could transfer to other rarer scenarios with high efficiency and has adequate availability on those platforms with low computing power. To the best of our knowledge, this is the first study that targeted improvement and integrated both Diffusion and Mamba modules in cross-modality object detection, successfully expanding the practical application of this type of model with its higher accuracy and more advanced architecture. Extensive experiments on both well-recognized and self-created datasets conclusively demonstrate that our CFMW achieves state-of-the-art detection performance, surpassing existing benchmarks. The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW.

[136]  arXiv:2404.16304 [pdf, other]
Title: BezierFormer: A Unified Architecture for 2D and 3D Lane Detection
Comments: ICME 2024, 11 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Lane detection has made significant progress in recent years, but there is not a unified architecture for its two sub-tasks: 2D lane detection and 3D lane detection. To fill this gap, we introduce B\'{e}zierFormer, a unified 2D and 3D lane detection architecture based on B\'{e}zier curve lane representation. B\'{e}zierFormer formulate queries as B\'{e}zier control points and incorporate a novel B\'{e}zier curve attention mechanism. This attention mechanism enables comprehensive and accurate feature extraction for slender lane curves via sampling and fusing multiple reference points on each curve. In addition, we propose a novel Chamfer IoU-based loss which is more suitable for the B\'{e}zier control points regression. The state-of-the-art performance of B\'{e}zierFormer on widely-used 2D and 3D lane detection benchmarks verifies its effectiveness and suggests the worthiness of further exploration.

[137]  arXiv:2404.16305 [pdf, other]
Title: Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Existing works have made strides in video generation, but the lack of sound effects (SFX) and background music (BGM) hinders a complete and immersive viewer experience. We introduce a novel semantically consistent v ideo-to-audio generation framework, namely SVA, which automatically generates audio semantically consistent with the given video content. The framework harnesses the power of multimodal large language model (MLLM) to understand video semantics from a key frame and generate creative audio schemes, which are then utilized as prompts for text-to-audio models, resulting in video-to-audio generation with natural language as an interface. We show the satisfactory performance of SVA through case study and discuss the limitations along with the future research direction. The project page is available at https://huiz-a.github.io/audio4video.github.io/.

[138]  arXiv:2404.16306 [pdf, other]
Title: TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free method that empowers a pretrained text-to-video (T2V) diffusion model to be conditioned on a provided image, enabling TI2V generation without any optimization, fine-tuning, or introducing external modules. Our approach leverages a pretrained T2V diffusion foundation model as the generative prior. To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image. To ensure temporal continuity, we employ a DDPM inversion strategy to initialize Gaussian noise for each newly synthesized frame and a resampling technique to help preserve visual details. We conduct comprehensive experiments on both domain-specific and open-domain datasets, where TI2V-Zero consistently outperforms a recent open-domain TI2V model. Furthermore, we show that TI2V-Zero can seamlessly extend to other tasks such as video infilling and prediction when provided with more images. Its autoregressive design also supports long video generation.

[139]  arXiv:2404.16307 [pdf, other]
Title: Boosting Model Resilience via Implicit Adversarial Data Augmentation
Comments: 9 pages, 6 figures, accepted by IJCAI 2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Data augmentation plays a pivotal role in enhancing and diversifying training data. Nonetheless, consistently improving model performance in varied learning scenarios, especially those with inherent data biases, remains challenging. To address this, we propose to augment the deep features of samples by incorporating their adversarial and anti-adversarial perturbation distributions, enabling adaptive adjustment in the learning difficulty tailored to each sample's specific characteristics. We then theoretically reveal that our augmentation process approximates the optimization of a surrogate loss function as the number of augmented copies increases indefinitely. This insight leads us to develop a meta-learning-based framework for optimizing classifiers with this novel loss, introducing the effects of augmentation while bypassing the explicit augmentation process. We conduct extensive experiments across four common biased learning scenarios: long-tail learning, generalized long-tail learning, noisy label learning, and subpopulation shift learning. The empirical results demonstrate that our method consistently achieves state-of-the-art performance, highlighting its broad adaptability.

[140]  arXiv:2404.16308 [pdf, other]
Title: WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models
Comments: Accepted at LREC-COLING 2024. Wenlong and Debanjan contributed equally
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)

The awareness of multi-cultural human values is critical to the ability of language models (LMs) to generate safe and personalized responses. However, this awareness of LMs has been insufficiently studied, since the computer science community lacks access to the large-scale real-world data about multi-cultural values. In this paper, we present WorldValuesBench, a globally diverse, large-scale benchmark dataset for the multi-cultural value prediction task, which requires a model to generate a rating response to a value question based on demographic contexts. Our dataset is derived from an influential social science project, World Values Survey (WVS), that has collected answers to hundreds of value questions (e.g., social, economic, ethical) from 94,728 participants worldwide. We have constructed more than 20 million examples of the type "(demographic attributes, value question) $\rightarrow$ answer" from the WVS responses. We perform a case study using our dataset and show that the task is challenging for strong open and closed-source models. On merely $11.1\%$, $25.0\%$, $72.2\%$, and $75.0\%$ of the questions, Alpaca-7B, Vicuna-7B-v1.5, Mixtral-8x7B-Instruct-v0.1, and GPT-3.5 Turbo can respectively achieve $<0.2$ Wasserstein 1-distance from the human normalized answer distributions. WorldValuesBench opens up new research avenues in studying limitations and opportunities in multi-cultural value awareness of LMs.

[141]  arXiv:2404.16309 [pdf, other]
Title: Robot Swarm Control Based on Smoothed Particle Hydrodynamics for Obstacle-Unaware Navigation
Subjects: Robotics (cs.RO)

Robot swarms hold immense potential for performing complex tasks far beyond the capabilities of individual robots. However, the challenge in unleashing this potential is the robots' limited sensory capabilities, which hinder their ability to detect and adapt to unknown obstacles in real-time. To overcome this limitation, we introduce a novel robot swarm control method with an indirect obstacle detector using a smoothed particle hydrodynamics (SPH) model. The indirect obstacle detector can predict the collision with an obstacle and its collision point solely from the robot's velocity information. This approach enables the swarm to effectively and accurately navigate environments without the need for explicit obstacle detection, significantly enhancing their operational robustness and efficiency. Our method's superiority is quantitatively validated through a comparative analysis, showcasing its significant navigation and pattern formation improvements under obstacle-unaware conditions.

[142]  arXiv:2404.16312 [pdf, other]
Title: 3D Guidance Law for Maximal Coverage and Target Enclosing with Inherent Safety
Subjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Robotics (cs.RO)

In this paper, we address the problem of enclosing an arbitrarily moving target in three dimensions by a single pursuer, which is an unmanned aerial vehicle (UAV), for maximum coverage while also ensuring the pursuer's safety by preventing collisions with the target. The proposed guidance strategy steers the pursuer to a safe region of space surrounding the target, allowing it to maintain a certain distance from the latter while offering greater flexibility in positioning and converging to any orbit within this safe zone. Our approach is distinguished by the use of nonholonomic constraints to model vehicles with accelerations serving as control inputs and coupled engagement kinematics to craft the pursuer's guidance law meticulously. Furthermore, we leverage the concept of the Lyapunov Barrier Function as a powerful tool to constrain the distance between the pursuer and the target within asymmetric bounds, thereby ensuring the pursuer's safety within the predefined region. To validate the efficacy and robustness of our algorithm, we conduct experimental tests by implementing a high-fidelity quadrotor model within Software-in-the-loop (SITL) simulations, encompassing various challenging target maneuver scenarios. The results obtained showcase the resilience of the proposed guidance law, effectively handling arbitrarily maneuvering targets, vehicle/autopilot dynamics, and external disturbances. Our method consistently delivers stable global enclosing behaviors, even in response to aggressive target maneuvers, and requires only relative information for successful execution.

[143]  arXiv:2404.16313 [pdf, ps, other]
Title: Further Investigations on Nonlinear Complexity of Periodic Binary Sequences
Subjects: Information Theory (cs.IT)

Nonlinear complexity is an important measure for assessing the randomness of sequences. In this paper we investigate how circular shifts affect the nonlinear complexities of finite-length binary sequences and then reveal a more explicit relation between nonlinear complexities of finite-length binary sequences and their corresponding periodic sequences. Based on the relation, we propose two algorithms that can generate all periodic binary sequences with any prescribed nonlinear complexity.

[144]  arXiv:2404.16314 [pdf, other]
Title: Parallel and (Nearly) Work-Efficient Dynamic Programming
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

The idea of dynamic programming (DP), proposed by Bellman in the 1950s, is one of the most important algorithmic techniques. However, in parallel, many fundamental and sequentially simple problems become more challenging, and open to a (nearly) work-efficient solution (i.e., the work is off by at most a polylogarithmic factor over the best sequential solution). In fact, sequential DP algorithms employ many advanced optimizations such as decision monotonicity or special data structures, and achieve better work than straightforward solutions. Many such optimizations are inherently sequential, which creates extra challenges for a parallel algorithm to achieve the same work bound.
The goal of this paper is to achieve (nearly) work-efficient parallel DP algorithms by parallelizing classic, highly-optimized and practical sequential algorithms. We show a general framework called the Cordon Algorithm for parallel DP algorithms, and use it to solve several classic problems. Our selection of problems includes Longest Increasing Subsequence (LIS), sparse Longest Common Subsequence (LCS), convex/concave generalized Least Weight Subsequence (LWS), Optimal Alphabetic Tree (OAT), and more. We show how the Cordon Algorithm can be used to achieve the same level of optimization as the sequential algorithms, and achieve good parallelism. Many of our algorithms are conceptually simple, and we show some experimental results as proofs-of-concept.

[145]  arXiv:2404.16317 [pdf, other]
Title: FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction
Comments: 10 pages, 3 figures
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

Tensors play a vital role in machine learning (ML) and often exhibit properties best explored while maintaining high-order. Efficiently performing ML computations requires taking advantage of sparsity, but generalized hardware support is challenging. This paper introduces FLAASH, a flexible and modular accelerator design for sparse tensor contraction that achieves over 25x speedup for a deep learning workload. Our architecture performs sparse high-order tensor contraction by distributing sparse dot products, or portions thereof, to numerous Sparse Dot Product Engines (SDPEs). Memory structure and job distribution can be customized, and we demonstrate a simple approach as a proof of concept. We address the challenges associated with control flow to navigate data structures, high-order representation, and high-sparsity handling. The effectiveness of our approach is demonstrated through various evaluations, showcasing significant speedup as sparsity and order increase.

[146]  arXiv:2404.16318 [pdf, other]
Title: The Continuous-Time Weighted-Median Opinion Dynamics
Comments: 13 pages, 1 figure
Subjects: Systems and Control (eess.SY)

Opinion dynamics models are important in understanding and predicting opinion formation processes within social groups. Although the weighted-averaging opinion-update mechanism is widely adopted as the micro-foundation of opinion dynamics, it bears a non-negligibly unrealistic implication: opinion attractiveness increases with opinion distance. Recently, the weighted-median mechanism has been proposed as a new microscopic mechanism of opinion exchange. Numerous advancements have been achieved regarding this new micro-foundation, from theoretical analysis to empirical validation, in a discrete-time asynchronous setup. However, the original discrete-time weighted-median model does not allow for "compromise behavior" in opinion exchanges, i.e., no intermediate opinions are created between disagreeing agents. To resolve this problem, this paper propose a novel continuous-time weighted-median opinion dynamics model, in which agents' opinions move towards the weighted-medians of their out-neighbors' opinions. It turns out that the proof methods for the original discrete-time asynchronous model are no longer applicable to the analysis of the continuous-time model. In this paper, we first establish the existence and uniqueness of the solution to the continuous-time weighted-median opinion dynamics by showing that the weighted-median mapping is contractive on any graph. We also characterize the set of all the equilibria. Then, by leveraging a new LaSalle invariance principle argument, we prove the convergence of the continuous-time weighted-median model for any initial condition and derive a necessary and sufficient condition for the convergence to consensus.

[147]  arXiv:2404.16322 [pdf, other]
Title: Bridging Speed and Accuracy to Approximate $K$-Nearest Neighbor Search
Comments: 13 pages
Subjects: Databases (cs.DB)

Approximate K-Nearest Neighbor (AKNN) search in high-dimensional spaces is a critical yet challenging problem. The efficiency of AKNN search largely depends on the computation of distances, a process that significantly affects the runtime. To improve computational efficiency, existing work often opts for estimating approximate distances rather than computing exact distances, at the cost of reduced AKNN search accuracy. The recent method of ADSampling has attempted to mitigate this problem by using random projection for distance approximations and adjusting these approximations based on error bounds to improve accuracy. However, ADSampling faces limitations in effectiveness and generality, mainly due to the suboptimality of its distance approximations and its heavy reliance on random projection matrices to obtain error bounds. In this study, we propose a new method that uses an optimal orthogonal projection instead of random projection, thereby providing improved distance approximations. Moreover, our method uses error quantiles instead of error bounds for approximation adjustment, and the derivation of error quantiles can be made independent of the projection matrix, thus extending the generality of our approach. Extensive experiments confirm the superior efficiency and effectiveness of the proposed method. In particular, compared to the state-of-the-art method of ADSampling, our method achieves a speedup of 1.6 to 2.1 times on real datasets with almost no loss of accuracy.

[148]  arXiv:2404.16323 [pdf, other]
Title: DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we study the problem of 3D reconstruction from a single-view RGB image and propose a novel approach called DIG3D for 3D object reconstruction and novel view synthesis. Our method utilizes an encoder-decoder framework which generates 3D Gaussians in decoder with the guidance of depth-aware image features from encoder. In particular, we introduce the use of deformable transformer, allowing efficient and effective decoding through 3D reference point and multi-layer refinement adaptations. By harnessing the benefits of 3D Gaussians, our approach offers an efficient and accurate solution for 3D reconstruction from single-view images. We evaluate our method on the ShapeNet SRN dataset, getting PSNR of 24.21 and 24.98 in car and chair dataset, respectively. The result outperforming the recent method by around 2.25%, demonstrating the effectiveness of our method in achieving superior results.

[149]  arXiv:2404.16324 [pdf, other]
Title: Improved impedance inversion by deep learning and iterated graph Laplacian
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Signal Processing (eess.SP)

Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results.
We present a hybrid method that combines deep learning with iterated graph Laplacian and show its application in acoustic impedance inversion which is a routine procedure in seismic explorations. A neural network is used to obtain a first approximation of the underlying acoustic impedance and construct a graph Laplacian matrix from this approximation. Afterwards, we use a Tikhonov-like variational method to solve the impedance inversion problem where the regularizer is based on the constructed graph Laplacian. The obtained solution can be shown to be more accurate and stable with respect to noise than the initial guess obtained by the neural network. This process can be iterated several times, each time constructing a new graph Laplacian matrix from the most recent reconstruction. The method converges after only a few iterations returning a much more accurate reconstruction.
We demonstrate the potential of our method on two different datasets and under various levels of noise. We use two different neural networks that have been introduced in previous works. The experiments show that our approach improves the reconstruction quality in the presence of noise.

[150]  arXiv:2404.16325 [pdf, other]
Title: Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Despite the remarkable success of deep learning in medical imaging analysis, medical image segmentation remains challenging due to the scarcity of high-quality labeled images for supervision. Further, the significant domain gap between natural and medical images in general and ultrasound images in particular hinders fine-tuning models trained on natural images to the task at hand. In this work, we address the performance degradation of segmentation models in low-data regimes and propose a prompt-less segmentation method harnessing the ability of segmentation foundation models to segment abstract shapes. We do that via our novel prompt point generation algorithm which uses coarse semantic segmentation masks as input and a zero-shot prompt-able foundation model as an optimization target. We demonstrate our method on a segmentation findings task (pathologic anomalies) in ultrasound images. Our method's advantages are brought to light in varying degrees of low-data regime experiments on a small-scale musculoskeletal ultrasound images dataset, yielding a larger performance gain as the training set size decreases.

[151]  arXiv:2404.16326 [pdf, other]
Title: NeuroKoopman Dynamic Causal Discovery
Subjects: Machine Learning (cs.LG)

In many real-world applications where the system dynamics has an underlying interdependency among its variables (such as power grid, economics, neuroscience, omics networks, environmental ecosystems, and others), one is often interested in knowing whether the past values of one time series influences the future of another, known as Granger causality, and the associated underlying dynamics. This paper introduces a Koopman-inspired framework that leverages neural networks for data-driven learning of the Koopman bases, termed NeuroKoopman Dynamic Causal Discovery (NKDCD), for reliably inferring the Granger causality along with the underlying nonlinear dynamics. NKDCD employs an autoencoder architecture that lifts the nonlinear dynamics to a higher dimension using data-learned bases, where the lifted time series can be reliably modeled linearly. The lifting function, the linear Granger causality lag matrices, and the projection function (from lifted space to base space) are all represented as multilayer perceptrons and are all learned simultaneously in one go. NKDCD also utilizes sparsity-inducing penalties on the weights of the lag matrices, encouraging the model to select only the needed causal dependencies within the data. Through extensive testing on practically applicable datasets, it is shown that the NKDCD outperforms the existing nonlinear Granger causality discovery approaches.

[152]  arXiv:2404.16327 [pdf, other]
Title: Generalized Step-Chirp Sequences With Flexible Bandwidth
Authors: Cheng Du, Yi Jiang
Comments: Accepted by 2024 IEEE International Symposium on Information Theory
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Sequences with low aperiodic autocorrelation sidelobes have been extensively researched in literatures. With sufficiently low integrated sidelobe level (ISL), their power spectrums are asymptotically flat over the whole frequency domain. However, for the beam sweeping in the massive multi-input multi-output (MIMO) broadcast channels, the flat spectrum should be constrained in a passband with tunable bandwidth to achieve the flexible tradeoffs between the beamforming gain and the beam sweeping time. Motivated by this application, we construct a family of sequences termed the generalized step-chirp (GSC) sequence with a closed-form expression, where some parameters can be tuned to adjust the bandwidth flexibly. In addition to the application in beam sweeping, some GSC sequences are closely connected with Mow's unified construction of sequences with perfect periodic autocorrelations, and may have a coarser phase resolution than the Mow sequence while their ISLs are comparable.

[153]  arXiv:2404.16329 [pdf, other]
Title: On Approximating the Dynamic and Discrete Network Flow Problem
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Computational Geometry (cs.CG)

We examine the dynamic network flow problem under the assumption that the flow consists of discrete units. The dynamic network flow problem is commonly addressed in the context of developing evacuation plans, where the flow is typically treated as a continuous quantity. However, real-world scenarios often involve moving groups, such as families, as single units. We demonstrate that solving the dynamic flow problem with this consideration is APX-hard. Conversely, we present a PTAS for instances where the base graph is a path with a constant number of nodes. We introduce a `ready time' constraint to the minsum bin packing problem, meaning certain items cannot be placed in specific bins, develop a PTAS for this modified problem, and apply our algorithms to the discrete and dynamic flow problem.

[154]  arXiv:2404.16331 [pdf, other]
Title: IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Model Weight Averaging (MWA) is a technique that seeks to enhance model's performance by averaging the weights of multiple trained models. This paper first empirically finds that 1) the vanilla MWA can benefit the class-imbalanced learning, and 2) performing model averaging in the early epochs of training yields a greater performance improvement than doing that in later epochs. Inspired by these two observations, in this paper we propose a novel MWA technique for class-imbalanced learning tasks named Iterative Model Weight Averaging (IMWA). Specifically, IMWA divides the entire training stage into multiple episodes. Within each episode, multiple models are concurrently trained from the same initialized model weight, and subsequently averaged into a singular model. Then, the weight of this average model serves as a fresh initialization for the ensuing episode, thus establishing an iterative learning paradigm. Compared to vanilla MWA, IMWA achieves higher performance improvements with the same computational cost. Moreover, IMWA can further enhance the performance of those methods employing EMA strategy, demonstrating that IMWA and EMA can complement each other. Extensive experiments on various class-imbalanced learning tasks, i.e., class-imbalanced image classification, semi-supervised class-imbalanced image classification and semi-supervised object detection tasks showcase the effectiveness of our IMWA.

[155]  arXiv:2404.16333 [pdf, other]
Title: AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation
Comments: under review
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)

Besides humans and machines, Artificial Intelligence (AI) models have emerged to be another important audience of programming languages, as we come to the era of large language models (LLMs). LLMs can now excel at coding competitions and even program like developers to address various tasks, such as math calculation. Yet, the grammar and layout of existing programs are designed for humans. Particularly, abundant grammar tokens and formatting tokens are included to make the code more readable to humans. While beneficial, such a human-centric design imposes an unnecessary computational burden on LLMs where each token, either consumed or generated, consumes computational resources. To improve inference efficiency and reduce computational costs, we propose the concept of AI-oriented grammar, which aims to represent the code in a way that better suits the working mechanism of AI models. Code written with AI-oriented grammar discards formats and uses a minimum number of tokens to convey code semantics effectively. To demonstrate the feasibility of this concept, we explore and implement the first AI-oriented grammar for Python, named Simple Python (SimPy). SimPy is crafted by revising the original Python grammar through a series of heuristic rules. Programs written in SimPy maintain identical Abstract Syntax Tree (AST) structures to those in standard Python, allowing execution via a modified AST parser. In addition, we explore methods to enable existing LLMs to proficiently understand and use SimPy, and ensure the changes remain imperceptible for human developers. Compared with the original Python, SimPy not only reduces token usage by 13.5% and 10.4% for CodeLlama and GPT-4, but can also achieve equivalent, even improved, performance over the models trained on Python code.

[156]  arXiv:2404.16336 [pdf, other]
Title: FedStyle: Style-Based Federated Learning Crowdsourcing Framework for Art Commissions
Comments: Accepted to ICME 2024
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The unique artistic style is crucial to artists' occupational competitiveness, yet prevailing Art Commission Platforms rarely support style-based retrieval. Meanwhile, the fast-growing generative AI techniques aggravate artists' concerns about releasing personal artworks to public platforms. To achieve artistic style-based retrieval without exposing personal artworks, we propose FedStyle, a style-based federated learning crowdsourcing framework. It allows artists to train local style models and share model parameters rather than artworks for collaboration. However, most artists possess a unique artistic style, resulting in severe model drift among them. FedStyle addresses such extreme data heterogeneity by having artists learn their abstract style representations and align with the server, rather than merely aggregating model parameters lacking semantics. Besides, we introduce contrastive learning to meticulously construct the style representation space, pulling artworks with similar styles closer and keeping different ones apart in the embedding space. Extensive experiments on the proposed datasets demonstrate the superiority of FedStyle.

[157]  arXiv:2404.16339 [pdf, other]
Title: Training-Free Unsupervised Prompt for Vision-Language Models
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Prompt learning has become the most effective paradigm for adapting large pre-trained vision-language models (VLMs) to downstream tasks. Recently, unsupervised prompt tuning methods, such as UPL and POUF, directly leverage pseudo-labels as supervisory information to fine-tune additional adaptation modules on unlabeled data. However, inaccurate pseudo labels easily misguide the tuning process and result in poor representation capabilities. In light of this, we propose Training-Free Unsupervised Prompts (TFUP), which maximally preserves the inherent representation capabilities and enhances them with a residual connection to similarity-based prediction probabilities in a training-free and labeling-free manner. Specifically, we integrate both instance confidence and prototype scores to select representative samples, which are used to customize a reliable Feature Cache Model (FCM) for training-free inference. Then, we design a Multi-level Similarity Measure (MSM) that considers both feature-level and semantic-level similarities to calculate the distance between each test image and the cached sample as the weight of the corresponding cached label to generate similarity-based prediction probabilities. In this way, TFUP achieves surprising performance, even surpassing the training-base method on multiple classification datasets. Based on our TFUP, we propose a training-based approach (TFUP-T) to further boost the adaptation performance. In addition to the standard cross-entropy loss, TFUP-T adopts an additional marginal distribution entropy loss to constrain the model from a global perspective. Our TFUP-T achieves new state-of-the-art classification performance compared to unsupervised and few-shot adaptation approaches on multiple benchmarks. In particular, TFUP-T improves the classification accuracy of POUF by 3.3% on the most challenging Domain-Net dataset.

[158]  arXiv:2404.16341 [pdf, other]
Title: PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin
Comments: 12 pages, 1 figure, 9 tables. Accepted at LREC-COLING 2024
Subjects: Computation and Language (cs.CL)

Computational historical linguistics seeks to systematically understand processes of sound change, including during periods at which little to no formal recording of language is attested. At the same time, few computational resources exist which deeply explore phonological and morphological connections between proto-languages and their descendants. This is particularly true for the family of Italic languages. To assist historical linguists in the study of Italic sound change, we introduce the Proto-Italic to Latin (PILA) dataset, which consists of roughly 3,000 pairs of forms from Proto-Italic and Latin. We provide a detailed description of how our dataset was created and organized. Then, we exhibit PILA's value in two ways. First, we present baseline results for PILA on a pair of traditional computational historical linguistics tasks. Second, we demonstrate PILA's capability for enhancing other historical-linguistic datasets through a dataset compatibility study.

[159]  arXiv:2404.16347 [pdf, other]
Title: Enhancing Arterial Blood Flow Simulations through Physics-Informed Neural Networks
Subjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)

This study introduces a computational approach leveraging physics-informed neural networks (PINNs) for the efficient computation of arterial blood flows, particularly focusing on solving the incompressible Navier-Stokes equations by using the domain decomposition technique. Unlike conventional computational fluid dynamics methods, PINNs offer advantages by eliminating the need for discretized meshes and enabling the direct solution of partial differential equations (PDEs). In this paper, we propose the weighted Extended Physics-Informed Neural Networks (WXPINNs) and weighted Conservative Physics-Informed Neural Networks (WCPINNs), tailored for detailed hemodynamic simulations based on generalized space-time domain decomposition techniques. The inclusion of multiple neural networks enhances the representation capacity of the weighted PINN methods. Furthermore, the weighted PINNs can be efficiently trained in parallel computing frameworks by employing separate neural networks for each sub-domain. We show that PINNs simulation results circumvent backflow instabilities, underscoring a notable advantage of employing PINNs over traditional numerical methods to solve such complex blood flow models. They naturally address such challenges within their formulations. The presented numerical results demonstrate that the proposed weighted PINNs outperform traditional PINNs settings, where sub-PINNs are applied to each subdomain separately. This study contributes to the integration of deep learning methodologies with fluid mechanics, paving the way for accurate and efficient high-fidelity simulations in biomedical applications, particularly in modeling arterial blood flow.

[160]  arXiv:2404.16348 [pdf, other]
Title: Dual Expert Distillation Network for Generalized Zero-Shot Learning
Comments: 11 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Zero-shot learning has consistently yielded remarkable progress via modeling nuanced one-to-one visual-attribute correlation. Existing studies resort to refining a uniform mapping function to align and correlate the sample regions and subattributes, ignoring two crucial issues: 1) the inherent asymmetry of attributes; and 2) the unutilized channel information. This paper addresses these issues by introducing a simple yet effective approach, dubbed Dual Expert Distillation Network (DEDN), where two experts are dedicated to coarse- and fine-grained visual-attribute modeling, respectively. Concretely, one coarse expert, namely cExp, has a complete perceptual scope to coordinate visual-attribute similarity metrics across dimensions, and moreover, another fine expert, namely fExp, consists of multiple specialized subnetworks, each corresponds to an exclusive set of attributes. Two experts cooperatively distill from each other to reach a mutual agreement during training. Meanwhile, we further equip DEDN with a newly designed backbone network, i.e., Dual Attention Network (DAN), which incorporates both region and channel attention information to fully exploit and leverage visual semantic knowledge. Experiments on various benchmark datasets indicate a new state-of-the-art.

[161]  arXiv:2404.16349 [pdf, ps, other]
Title: More Asymmetry Yields Faster Matrix Multiplication
Comments: 44 pages. arXiv admin note: text overlap with arXiv:2307.07970
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

We present a new improvement on the laser method for designing fast matrix multiplication algorithms. The new method further develops the recent advances by [Duan, Wu, Zhou FOCS 2023] and [Vassilevska Williams, Xu, Xu, Zhou SODA 2024]. Surprisingly the new improvement is achieved by incorporating more asymmetry in the analysis, circumventing a fundamental tool of prior work that requires two of the three dimensions to be treated identically. The method yields a new bound on the square matrix multiplication exponent $$\omega<2.371339,$$ improved from the previous bound of $\omega<2.371552$. We also improve the bounds of the exponents for multiplying rectangular matrices of various shapes.

[162]  arXiv:2404.16351 [pdf, other]
Title: QREChem: Quantum Resource Estimation Software for Chemistry Applications
Subjects: Emerging Technologies (cs.ET)

As quantum hardware continues to improve, more and more application scientists have entered the field of quantum computing. However, even with the rapid improvements in the last few years, quantum devices, especially for quantum chemistry applications, still struggle to perform calculations that classical computers could not calculate. In lieu of being able to perform specific calculations, it is important have a systematic way of estimating the resources necessary to tackle specific problems. Standard arguments about computational complexity provide hope that quantum computers will be useful for problems in quantum chemistry but obscure the true impact of many algorithmic overheads. These overheads will ultimately determine the precise point when quantum computers will perform better than classical computers. We have developed QREChem to provide logical resource estimates for ground state energy estimation in quantum chemistry through a Trotter-based quantum phase estimation approach. QREChem provides resource estimates which include the specific overheads inherent to problems in quantum chemistry by including heuristic estimates of the number of Trotter steps and number of necessary ancilla, allowing for more accurate estimates of the total number of gates. We utilize QREChem to provide logical resource estimates for a variety of small molecules in various basis sets, obtaining estimates in the range of $10^7-10^{15}$ for total number of T gates. We also determine estimates for the FeMoco molecule and compare all estimates to other resource estimation tools.

[163]  arXiv:2404.16356 [pdf, other]
Title: Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A Survey
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Generative AI (GAI) can enhance the cognitive, reasoning, and planning capabilities of intelligent modules in the Internet of Vehicles (IoV) by synthesizing augmented datasets, completing sensor data, and making sequential decisions. In addition, the mixture of experts (MoE) can enable the distributed and collaborative execution of AI models without performance degradation between connected vehicles. In this survey, we explore the integration of MoE and GAI to enable Artificial General Intelligence in IoV, which can enable the realization of full autonomy for IoV with minimal human supervision and applicability in a wide range of mobility scenarios, including environment monitoring, traffic management, and autonomous driving. In particular, we present the fundamentals of GAI, MoE, and their interplay applications in IoV. Furthermore, we discuss the potential integration of MoE and GAI in IoV, including distributed perception and monitoring, collaborative decision-making and planning, and generative modeling and simulation. Finally, we present several potential research directions for facilitating the integration.

[164]  arXiv:2404.16359 [pdf, other]
Title: An Improved Graph Pooling Network for Skeleton-Based Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matrix of the original feature is used to adaptively adjust the weight of information in different regions of the newly generated features, resulting in more flexible and effective processing. To prevent the irreversible loss of discriminative information, we propose a cross fusion module and an information supplement module to provide block-level and input-level information respectively. As a plug-and-play structure, the proposed operation can be seamlessly combined with existing GCN-based models. We conducted extensive evaluations on several challenging benchmarks, and the experimental results indicate the effectiveness of our proposed solutions. For example, in the cross-subject evaluation of the NTU-RGB+D 60 dataset, IGPN achieves a significant improvement in accuracy compared to the baseline while reducing Flops by nearly 70%; a heavier version has also been introduced to further boost accuracy.

[165]  arXiv:2404.16361 [pdf, other]
Title: Evolutionary Causal Discovery with Relative Impact Stratification for Interpretable Data Analysis
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Symbolic Computation (cs.SC)

This study proposes Evolutionary Causal Discovery (ECD) for causal discovery that tailors response variables, predictor variables, and corresponding operators to research datasets. Utilizing genetic programming for variable relationship parsing, the method proceeds with the Relative Impact Stratification (RIS) algorithm to assess the relative impact of predictor variables on the response variable, facilitating expression simplification and enhancing the interpretability of variable relationships. ECD proposes an expression tree to visualize the RIS results, offering a differentiated depiction of unknown causal relationships compared to conventional causal discovery. The ECD method represents an evolution and augmentation of existing causal discovery methods, providing an interpretable approach for analyzing variable relationships in complex systems, particularly in healthcare settings with Electronic Health Record (EHR) data. Experiments on both synthetic and real-world EHR datasets demonstrate the efficacy of ECD in uncovering patterns and mechanisms among variables, maintaining high accuracy and stability across different noise levels. On the real-world EHR dataset, ECD reveals the intricate relationships between the response variable and other predictive variables, aligning with the results of structural equation modeling and shapley additive explanations analyses.

[166]  arXiv:2404.16362 [pdf, other]
Title: Feature graph construction with static features for malware detection
Subjects: Cryptography and Security (cs.CR)

Malware can greatly compromise the integrity and trustworthiness of information and is in a constant state of evolution. Existing feature fusion-based detection methods generally overlook the correlation between features. And mere concatenation of features will reduce the model's characterization ability, lead to low detection accuracy. Moreover, these methods are susceptible to concept drift and significant degradation of the model. To address those challenges, we introduce a feature graph-based malware detection method, MFGraph, to characterize applications by learning feature-to-feature relationships to achieve improved detection accuracy while mitigating the impact of concept drift. In MFGraph, we construct a feature graph using static features extracted from binary PE files, then apply a deep graph convolutional network to learn the representation of the feature graph. Finally, we employ the representation vectors obtained from the output of a three-layer perceptron to differentiate between benign and malicious software. We evaluated our method on the EMBER dataset, and the experimental results demonstrate that it achieves an AUC score of 0.98756 on the malware detection task, outperforming other baseline models. Furthermore, the AUC score of MFGraph decreases by only 5.884% in one year, indicating that it is the least affected by concept drift.

[167]  arXiv:2404.16363 [pdf, other]
Title: Byzantine Attacks Exploiting Penalties in Ethereum PoS
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

In May 2023, the Ethereum blockchain experienced its first inactivity leak, a mechanism designed to reinstate chain finalization amid persistent network disruptions. This mechanism aims to reduce the voting power of validators who are unreachable within the network, reallocating this power to active validators. This paper investigates the implications of the inactivity leak on safety within the Ethereum blockchain. Our theoretical analysis reveals scenarios where actions by Byzantine validators expedite the finalization of two conflicting branches, and instances where Byzantine validators reach a voting power exceeding the critical safety threshold of one-third. Additionally, we revisit the probabilistic bouncing attack, illustrating how the inactivity leak can result in a probabilistic breach of safety, potentially allowing Byzantine validators to exceed the one-third safety threshold. Our findings uncover how penalizing inactive nodes can compromise blockchain properties, particularly in the presence of Byzantine validators capable of coordinating actions.

[168]  arXiv:2404.16364 [pdf, other]
Title: ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze
Subjects: Artificial Intelligence (cs.AI)

MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost MCTS-based algorithms. Specifically, we propose a new scheme that simplifies data collecting and reanalyzing, which significantly reduces the search cost while guarantees the performance as well. Furthermore, to accelerate each search process, we conceive a method to reuse the subsequent information in the trajectory. The corresponding analysis conducted on the bandit model also provides auxiliary theoretical substantiation for our design. Experiments conducted on Atari environments and board games demonstrates that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero benchmark at https://github.com/opendilab/LightZero.

[169]  arXiv:2404.16365 [pdf, other]
Title: VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Despite their remarkable successes, state-of-the-art language models face challenges in grasping certain important semantic details. This paper introduces the VISLA (Variance and Invariance to Semantic and Lexical Alterations) benchmark, designed to evaluate the semantic and lexical understanding of language models. VISLA presents a 3-way semantic (in)equivalence task with a triplet of sentences associated with an image, to evaluate both vision-language models (VLMs) and unimodal language models (ULMs). An evaluation involving 34 VLMs and 20 ULMs reveals surprising difficulties in distinguishing between lexical and semantic variations. Spatial semantics encoded by language models also appear to be highly sensitive to lexical information. Notably, text encoders of VLMs demonstrate greater sensitivity to semantic and lexical variations than unimodal text encoders. Our contributions include the unification of image-to-text and text-to-text retrieval tasks, an off-the-shelf evaluation without fine-tuning, and assessing LMs' semantic (in)variance in the presence of lexical alterations. The results highlight strengths and weaknesses across diverse vision and unimodal language models, contributing to a deeper understanding of their capabilities. % VISLA enables a rigorous evaluation, shedding light on language models' capabilities in handling semantic and lexical nuances. Data and code will be made available at https://github.com/Sri-Harsha/visla_benchmark.

[170]  arXiv:2404.16366 [pdf, other]
Title: Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection
Comments: 14 pages, 9 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Unsupervised graph anomaly detection aims at identifying rare patterns that deviate from the majority in a graph without the aid of labels, which is important for a variety of real-world applications. Recent advances have utilized Graph Neural Networks (GNNs) to learn effective node representations by aggregating information from neighborhoods. This is motivated by the hypothesis that nodes in the graph tend to exhibit consistent behaviors with their neighborhoods. However, such consistency can be disrupted by graph anomalies in multiple ways. Most existing methods directly employ GNNs to learn representations, disregarding the negative impact of graph anomalies on GNNs, resulting in sub-optimal node representations and anomaly detection performance. While a few recent approaches have redesigned GNNs for graph anomaly detection under semi-supervised label guidance, how to address the adverse effects of graph anomalies on GNNs in unsupervised scenarios and learn effective representations for anomaly detection are still under-explored. To bridge this gap, in this paper, we propose a simple yet effective framework for Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection (G3AD). Specifically, G3AD introduces two auxiliary networks along with correlation constraints to guard the GNNs from inconsistent information encoding. Furthermore, G3AD introduces an adaptive caching module to guard the GNNs from solely reconstructing the observed data that contains anomalies. Extensive experiments demonstrate that our proposed G3AD can outperform seventeen state-of-the-art methods on both synthetic and real-world datasets.

[171]  arXiv:2404.16367 [pdf, other]
Title: Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias. In this work, we investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge. We extensively experiment with transformer models trained on multiple synthetic datasets and with different training objectives and show that while other objectives e.g. sequence-to-sequence modeling, prefix language modeling, often failed to lead to hierarchical generalization, models trained with the language modeling objective consistently learned to generalize hierarchically. We then conduct pruning experiments to study how transformers trained with the language modeling objective encode hierarchical structure. When pruned, we find joint existence of subnetworks within the model with different generalization behaviors (subnetworks corresponding to hierarchical structure and linear order). Finally, we take a Bayesian perspective to further uncover transformers' preference for hierarchical generalization: We establish a correlation between whether transformers generalize hierarchically on a dataset and whether the simplest explanation of that dataset is provided by a hierarchical grammar compared to regular grammars exhibiting linear generalization.

[172]  arXiv:2404.16369 [pdf, other]
Title: Don't Say No: Jailbreaking LLM by Suppressing Refusal
Subjects: Computation and Language (cs.CL)

Ensuring the safety alignment of Large Language Models (LLMs) is crucial to generating responses consistent with human values. Despite their ability to recognize and avoid harmful queries, LLMs are vulnerable to "jailbreaking" attacks, where carefully crafted prompts elicit them to produce toxic content. One category of jailbreak attacks is reformulating the task as adversarial attacks by eliciting the LLM to generate an affirmative response. However, the typical attack in this category GCG has very limited attack success rate. In this study, to better study the jailbreak attack, we introduce the DSN (Don't Say No) attack, which prompts LLMs to not only generate affirmative responses but also novelly enhance the objective to suppress refusals. In addition, another challenge lies in jailbreak attacks is the evaluation, as it is difficult to directly and accurately assess the harmfulness of the attack. The existing evaluation such as refusal keyword matching has its own limitation as it reveals numerous false positive and false negative instances. To overcome this challenge, we propose an ensemble evaluation pipeline incorporating Natural Language Inference (NLI) contradiction assessment and two external LLM evaluators. Extensive experiments demonstrate the potency of the DSN and the effectiveness of ensemble evaluation compared to baseline methods.

[173]  arXiv:2404.16370 [pdf, other]
Title: MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter
Comments: IEEE International Conference on Robotics and Automation (ICRA2024)
Subjects: Robotics (cs.RO)

This paper presents a 6-DoF range-based Monte Carlo localization method with a GPU-accelerated Stein particle filter. To update a massive amount of particles, we propose a Gauss-Newton-based Stein variational gradient descent (SVGD) with iterative neighbor particle search. This method uses SVGD to collectively update particle states with gradient and neighborhood information, which provides efficient particle sampling. For an efficient neighbor particle search, it uses locality sensitive hashing and iteratively updates the neighbor list of each particle over time. The neighbor list is then used to propagate the posterior probabilities of particles over the neighbor particle graph. The proposed method is capable of evaluating one million particles in real-time on a single GPU and enables robust pose initialization and re-localization without an initial pose estimate. In experiments, the proposed method showed an extreme robustness to complete sensor occlusion (i.e., kidnapping), and enabled pinpoint sensor localization without any prior information.

[174]  arXiv:2404.16371 [pdf, other]
Title: Multimodal Information Interaction for Medical Image Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The use of multimodal data in assisted diagnosis and segmentation has emerged as a prominent area of interest in current research. However, one of the primary challenges is how to effectively fuse multimodal features. Most of the current approaches focus on the integration of multimodal features while ignoring the correlation and consistency between different modal features, leading to the inclusion of potentially irrelevant information. To address this issue, we introduce an innovative Multimodal Information Cross Transformer (MicFormer), which employs a dual-stream architecture to simultaneously extract features from each modality. Leveraging the Cross Transformer, it queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features. Additionally, we incorporate a deformable Transformer architecture to expand the search space. We conducted experiments on the MM-WHS dataset, and in the CT-MRI multimodal image segmentation task, we successfully improved the whole-heart segmentation DICE score to 85.57 and MIoU to 75.51. Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively. This demonstrates the efficacy of MicFormer in integrating relevant information between different modalities in multimodal tasks. These findings hold significant implications for multimodal image tasks, and we believe that MicFormer possesses extensive potential for broader applications across various domains. Access to our method is available at https://github.com/fxxJuses/MICFormer

[175]  arXiv:2404.16375 [pdf, other]
Title: List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Comments: Preprint
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. These tags, marked with alphanumerics, can be indexed via text tokens for easy reference. Despite the extraordinary performance from GPT-4V, we observe that other Multimodal Large Language Models (MLLMs) struggle to understand these visual tags. To promote the learning of SoM prompting for open-source models, we propose a new learning paradigm: "list items one by one," which asks the model to enumerate and describe all visual tags placed on the image following the alphanumeric orders of tags. By integrating our curated dataset with other visual instruction tuning datasets, we are able to equip existing MLLMs with the SoM prompting ability. Furthermore, we evaluate our finetuned SoM models on five MLLM benchmarks. We find that this new dataset, even in a relatively small size (10k-30k images with tags), significantly enhances visual reasoning capabilities and reduces hallucinations for MLLMs. Perhaps surprisingly, these improvements persist even when the visual tags are omitted from input images during inference. This suggests the potential of "list items one by one" as a new paradigm for training MLLMs, which strengthens the object-text alignment through the use of visual tags in the training stage. Finally, we conduct analyses by probing trained models to understand the working mechanism of SoM. Our code and data are available at \url{https://github.com/zzxslp/SoM-LLaVA}.

[176]  arXiv:2404.16376 [pdf, ps, other]
Title: A Hypergraph Approach to Distributed Broadcast
Subjects: Information Theory (cs.IT); Multiagent Systems (cs.MA); Systems and Control (eess.SY)

This paper explores the distributed broadcast problem within the context of network communications, a critical challenge in decentralized information dissemination. We put forth a novel hypergraph-based approach to address this issue, focusing on minimizing the number of broadcasts to ensure comprehensive data sharing among all network users. A key contribution of our work is the establishment of a general lower bound for the problem using the min-cut capacity of hypergraphs. Additionally, we present the distributed broadcast for quasi-trees (DBQT) algorithm tailored for the unique structure of quasi-trees, which is proven to be optimal. This paper advances both network communication strategies and hypergraph theory, with implications for a wide range of real-world applications, from vehicular and sensor networks to distributed storage systems.

[177]  arXiv:2404.16379 [pdf, other]
Title: Optimal and Bounded Suboptimal Any-Angle Multi-agent Pathfinding
Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)

Multi-agent pathfinding (MAPF) is the problem of finding a set of conflict-free paths for a set of agents. Typically, the agents' moves are limited to a pre-defined graph of possible locations and allowed transitions between them, e.g. a 4-neighborhood grid. We explore how to solve MAPF problems when each agent can move between any pair of possible locations as long as traversing the line segment connecting them does not lead to the collision with the obstacles. This is known as any-angle pathfinding. We present the first optimal any-angle multi-agent pathfinding algorithm. Our planner is based on the Continuous Conflict-based Search (CCBS) algorithm and an optimal any-angle variant of the Safe Interval Path Planning (TO-AA-SIPP). The straightforward combination of those, however, scales poorly since any-angle path finding induces search trees with a very large branching factor. To mitigate this, we adapt two techniques from classical MAPF to the any-angle setting, namely Disjoint Splitting and Multi-Constraints. Experimental results on different combinations of these techniques show they enable solving over 30% more problems than the vanilla combination of CCBS and TO-AA-SIPP. In addition, we present a bounded-suboptimal variant of our algorithm, that enables trading runtime for solution cost in a controlled manner.

[178]  arXiv:2404.16380 [pdf, ps, other]
Title: Efficient Higher-order Convolution for Small Kernels in Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep convolutional neural networks (DCNNs) are a class of artificial neural networks, primarily for computer vision tasks such as segmentation and classification. Many nonlinear operations, such as activation functions and pooling strategies, are used in DCNNs to enhance their ability to process different signals with different tasks. Conceptional convolution, a linear filter, is the essential component of DCNNs while nonlinear convolution is generally implemented as higher-order Volterra filters, However, for Volterra filtering, significant memory and computational costs pose a primary limitation for its widespread application in DCNN applications. In this study, we propose a novel method to perform higher-order Volterra filtering with lower memory and computation cost in forward and backward pass in DCNN training. The proposed method demonstrates computational advantages compared with conventional Volterra filter implementation. Furthermore, based on the proposed method, a new attention module called Higher-order Local Attention Block (HLA) is proposed and tested on CIFAR-100 dataset, which shows competitive improvement for classification task. Source code is available at: https://github.com/WinterWen666/Efficient-High-Order-Volterra-Convolution.git

[179]  arXiv:2404.16381 [pdf, ps, other]
Title: Abstracting Effect Systems for Algebraic Effect Handlers
Authors: Takuma Yoshioka (1), Taro Sekiyama (2), Atsushi Igarashi (1) ((1) Kyoto University, (2) National Institute of Informatics & SOKENDAI)
Subjects: Programming Languages (cs.PL)

Many effect systems for algebraic effect handlers are designed to guarantee that all invoked effects are handled adequately. However, respective researchers have developed their own effect systems that differ in how to represent the collections of effects that may happen. This situation results in blurring what is required for the representation and manipulation of effect collections in a safe effect system.
In this work, we present a language ${\lambda_{\mathrm{EA}}}$ equipped with an effect system that abstracts the existing effect systems for algebraic effect handlers. The effect system of ${\lambda_{\mathrm{EA}}}$ is parameterized over effect algebras, which abstract the representation and manipulation of effect collections in safe effect systems. We prove the type-and-effect safety of ${\lambda_{\mathrm{EA}}}$ by assuming that a given effect algebra meets certain properties called safety conditions. As a result, we can obtain the safety properties of a concrete effect system by proving that an effect algebra corresponding to the concrete system meets the safety conditions. We also show that effect algebras meeting the safety conditions are expressive enough to accommodate some existing effect systems, each of which represents effect collections in a different style. Our framework can also differentiate the safety aspects of the effect collections of the existing effect systems. To this end, we extend ${\lambda_{\mathrm{EA}}}$ and the safety conditions to lift coercions and type-erasure semantics, propose other effect algebras including ones for which no effect system has been studied in the literature, and compare which effect algebra is safe and which is not for the extensions.

[180]  arXiv:2404.16382 [pdf, ps, other]
Title: A Multivariate to Bivariate Reduction for Noncommutative Rank and Related Results
Comments: 31 pages
Subjects: Computational Complexity (cs.CC)

We study the noncommutative rank problem, ncRANK, of computing the rank of matrices with linear entries in $n$ noncommuting variables and the problem of noncommutative Rational Identity Testing, RIT, which is to decide if a given rational formula in $n$ noncommuting variables is zero on its domain of definition. Motivated by the question whether these problems have deterministic NC algorithms, we revisit their interrelationship from a parallel complexity point of view. We show the following results:
1. Based on Cohn's embedding theorem \cite{Co90,Cohnfir} we show deterministic NC reductions from multivariate ncRANK to bivariate ncRANK and from multivariate RIT to bivariate RIT.
2. We obtain a deterministic NC-Turing reduction from bivariate $\RIT$ to bivariate ncRANK, thereby proving that a deterministic NC algorithm for bivariate ncRANK would imply that both multivariate RIT and multivariate ncRANK are in deterministic NC.

[181]  arXiv:2404.16385 [pdf, other]
Title: Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored. Given the unique challenges in the medical domain, such as limited data scope and significant domain-specific requirements, evaluating and adapting Parameter-Efficient Fine-Tuning (PEFT) methods specifically for Med-VLMs is essential. Most of the current PEFT methods on Med-VLMs have yet to be comprehensively investigated but mainly focus on adding some components to the model's structure or input. However, fine-tuning intrinsic model components often yields better generality and consistency, and its impact on the ultimate performance of Med-VLMs has been widely overlooked and remains understudied. In this paper, we endeavour to explore an alternative to traditional PEFT methods, especially the impact of fine-tuning LayerNorm layers, FFNs and Attention layers on the Med-VLMs. Our comprehensive studies span both small-scale and large-scale Med-VLMs, evaluating their performance under various fine-tuning paradigms across tasks such as Medical Visual Question Answering and Medical Imaging Report Generation. The findings reveal unique insights into the effects of intrinsic parameter fine-tuning methods on fine-tuning Med-VLMs to downstream tasks and expose fine-tuning solely the LayerNorm layers not only surpasses the efficiency of traditional PEFT methods but also retains the model's accuracy and generalization capabilities across a spectrum of medical downstream tasks. The experiments show LayerNorm fine-tuning's superior adaptability and scalability, particularly in the context of large-scale Med-VLMs.

[182]  arXiv:2404.16386 [pdf, other]
Title: Promoting CNNs with Cross-Architecture Knowledge Distillation for Efficient Monocular Depth Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recently, the performance of monocular depth estimation (MDE) has been significantly boosted with the integration of transformer models. However, the transformer models are usually computationally-expensive, and their effectiveness in light-weight models are limited compared to convolutions. This limitation hinders their deployment on resource-limited devices. In this paper, we propose a cross-architecture knowledge distillation method for MDE, dubbed DisDepth, to enhance efficient CNN models with the supervision of state-of-the-art transformer models. Concretely, we first build a simple framework of convolution-based MDE, which is then enhanced with a novel local-global convolution module to capture both local and global information in the image. To effectively distill valuable information from the transformer teacher and bridge the gap between convolution and transformer features, we introduce a method to acclimate the teacher with a ghost decoder. The ghost decoder is a copy of the student's decoder, and adapting the teacher with the ghost decoder aligns the features to be student-friendly while preserving their original performance. Furthermore, we propose an attentive knowledge distillation loss that adaptively identifies features valuable for depth estimation. This loss guides the student to focus more on attentive regions, improving its performance. Extensive experiments on KITTI and NYU Depth V2 datasets demonstrate the effectiveness of DisDepth. Our method achieves significant improvements on various efficient backbones, showcasing its potential for efficient monocular depth estimation.

[183]  arXiv:2404.16387 [pdf, other]
Title: Revisiting Restarts of CDCL: Should the Search Information be Preserved?
Subjects: Logic in Computer Science (cs.LO)

SAT solvers are indispensable in formal verification for hardware and software with many important applications. CDCL is the most widely used framework for modern SAT solvers, and restart is an essential technique of CDCL. When restarting, CDCL solvers cancel the current variable assignment while maintaining the branching order, variable phases, and learnt clauses. This type of restart is referred to as warm restart in this paper. Although different restart policies have been studied, there is no study on whether such information should be kept after restarts. This work addresses this question and finds some interesting observations.
This paper indicates that under this popular warm restart scheme, there is a substantial variation in run-time with different randomized initial orders and phases, which motivates us to forget some learned information periodically to prevent being stuck in a disadvantageous search space. We propose a new type of restart called cold restart, which differs from previous restarts by forgetting some of the learned information. Experiments show that modern CDCL solvers can benefit from periodically conducting cold restarts. Based on the analysis of the cold-restart strategies, we develop a parallel SAT solver. Both the sequential and parallel versions of cold restart are more suitable for satisfiable instances, which suggests that existing CDCL heuristics for information management should be revised if one hopes to construct a satisfiable-oriented SAT solver.

[184]  arXiv:2404.16388 [pdf, other]
Title: SwarmRL: Building the Future of Smart Active Systems
Comments: 16 pages, 3 figures
Subjects: Robotics (cs.RO); Soft Condensed Matter (cond-mat.soft); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Biological Physics (physics.bio-ph)

This work introduces SwarmRL, a Python package designed to study intelligent active particles. SwarmRL provides an easy-to-use interface for developing models to control microscopic colloids using classical control and deep reinforcement learning approaches. These models may be deployed in simulations or real-world environments under a common framework. We explain the structure of the software and its key features and demonstrate how it can be used to accelerate research. With SwarmRL, we aim to streamline research into micro-robotic control while bridging the gap between experimental and simulation-driven sciences. SwarmRL is available open-source on GitHub at https://github.com/SwarmRL/SwarmRL.

[185]  arXiv:2404.16391 [pdf, other]
Title: Stability-Oriented Prediction Horizons Design of Generalized Predictive Control for DC/DC Boost Converter
Subjects: Systems and Control (eess.SY)

This paper introduces a novel approach in designing prediction horizons on a generalized predictive control for a DC/DC boost converter. This method involves constructing a closed-loop system model and assessing the impact of different prediction horizons on system stability. In contrast to conventional design approaches that often rely on empirical prediction horizon selection or incorporate non-linear observers, the proposed method establishes a rigorous boundary for the prediction horizon to ensure system stability. This approach facilitates the selection of an appropriate prediction horizon while avoiding excessively short horizons that can lead to instability and preventing the adoption of unnecessarily long horizons that would burden the controller with high computational demands. Finally, the accuracy of the design method has been confirmed through experimental testing. Moreover, it has been demonstrated that the prediction horizon determined by this method reduces the computational burden by 10\%-20\% compared to the empirically selected prediction horizon.

[186]  arXiv:2404.16393 [pdf, other]
Title: Dirigent: Lightweight Serverless Orchestration
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)

While Function as a Service (FaaS) platforms can initialize function sandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule functions in real FaaS clusters can be orders of magnitude higher. We find that the current approach of building FaaS cluster managers on top of legacy orchestration systems like Kubernetes leads to high scheduling delay at high sandbox churn, which is typical in FaaS clusters. While generic cluster managers use hierarchical abstractions and multiple internal components to manage and reconcile state with frequent persistent updates, this becomes a bottleneck for FaaS, where cluster state frequently changes as sandboxes are created on the critical path of requests. Based on our root cause analysis of performance issues in existing FaaS cluster managers, we propose Dirigent, a clean-slate system architecture for FaaS orchestration with three key principles. First, Dirigent optimizes internal cluster manager abstractions to simplify state management. Second, it eliminates persistent state updates on the critical path of function invocations, leveraging the fact that FaaS abstracts sandboxes from users to relax exact state reconstruction guarantees. Finally, Dirigent runs monolithic control and data planes to minimize internal communication overheads and maximize throughput. We compare Dirigent to state-of-the-art FaaS platforms and show that Dirigent reduces 99th percentile per-function scheduling latency for a production workload by 2.79x compared to AWS Lambda and can spin up 2500 sandboxes per second at low latency, which is 1250x more than with Knative.

[187]  arXiv:2404.16394 [pdf, ps, other]
Title: STAR-RIS-Assisted Communication Radar Coexistence: Analysis and Optimization
Comments: accepted in IEEE TVT, 14 pages, 7 figures
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

Integrated sensing and communication (ISAC) is expected to play a prominent role among emerging technologies in future wireless communications. In particular, a communication radar coexistence system is degraded significantly by mutual interference. In this work, given the advantages of promising reconfigurable intelligent surface (RIS), we propose a simultaneously transmitting and reflecting RIS (STAR-RIS)-assisted radar coexistence system where a STAR-RIS is introduced to improve the communication performance while suppressing the mutual interference and providing full space coverage. Based on the realistic conditions of correlated fading, and the presence of multiple user equipments (UEs) at both sides of the RIS, we derive the achievable rates at the radar and the communication receiver side in closed forms in terms of statistical channel state information (CSI). Next, we perform alternating optimization (AO) for optimizing the STAR-RIS and the radar beamforming. Regarding the former, we optimize the amplitudes and phase shifts of the STAR-RIS through a projected gradient ascent algorithm (PGAM) simultaneously with respect to the amplitudes and phase shifts of the surface for both energy splitting (ES) and mode switching (MS) operation protocols. The proposed optimization saves enough overhead since it can be performed every several coherence intervals. This property is particularly beneficial compared to reflecting-only RIS because a STAR-RIS includes the double number of variables, which require increased overhead. Finally, simulation results illustrate how the proposed architecture outperforms the conventional RIS counterpart, and show how the various parameters affect the performance. Moreover, a benchmark full instantaneous CSI (I-CSI) based design is provided and shown to result in higher sum-rate but also in large overhead associated with complexity.

[188]  arXiv:2404.16395 [pdf, other]
Title: Fuzzy Inference System for Test Case Prioritization in Software Testing
Comments: The article has been submitted to IEEE for consideration
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

In the realm of software development, testing is crucial for ensuring software quality and adherence to requirements. However, it can be time-consuming and resource-intensive, especially when dealing with large and complex software systems. Test case prioritization (TCP) is a vital strategy to enhance testing efficiency by identifying the most critical test cases for early execution. This paper introduces a novel fuzzy logic-based approach to automate TCP, using fuzzy linguistic variables and expert-derived fuzzy rules to establish a link between test case characteristics and their prioritization. Our methodology utilizes two fuzzy variables - failure rate and execution time - alongside two crisp parameters: Prerequisite Test Case and Recently Updated Flag. Our findings demonstrate the proposed system capacity to rank test cases effectively through experimental validation on a real-world software system. The results affirm the practical applicability of our approach in optimizing the TCP and reducing the resource intensity of software testing.

[189]  arXiv:2404.16398 [pdf, other]
Title: Revisiting Relevance Feedback for CLIP-based Interactive Image Retrieval
Comments: 20 pages, 8 sugures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Many image retrieval studies use metric learning to train an image encoder. However, metric learning cannot handle differences in users' preferences, and requires data to train an image encoder. To overcome these limitations, we revisit relevance feedback, a classic technique for interactive retrieval systems, and propose an interactive CLIP-based image retrieval system with relevance feedback. Our retrieval system first executes the retrieval, collects each user's unique preferences through binary feedback, and returns images the user prefers. Even when users have various preferences, our retrieval system learns each user's preference through the feedback and adapts to the preference. Moreover, our retrieval system leverages CLIP's zero-shot transferability and achieves high accuracy without training. We empirically show that our retrieval system competes well with state-of-the-art metric learning in category-based image retrieval, despite not training image encoders specifically for each dataset. Furthermore, we set up two additional experimental settings where users have various preferences: one-label-based image retrieval and conditioned image retrieval. In both cases, our retrieval system effectively adapts to each user's preferences, resulting in improved accuracy compared to image retrieval without feedback. Overall, our work highlights the potential benefits of integrating CLIP with classic relevance feedback techniques to enhance image retrieval.

[190]  arXiv:2404.16399 [pdf, other]
Title: Offline Reinforcement Learning with Behavioral Supervisor Tuning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance, which requires policy rollouts in the environment to evaluate; this can rapidly become cumbersome. Furthermore, substantial tuning requirements can hamper the adoption of these algorithms in practical domains. In this paper, we present TD3 with Behavioral Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support. TD3-BST can learn more effective policies from offline datasets compared to previous methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning.

[191]  arXiv:2404.16405 [pdf, other]
Title: Lost in Recursion: Mining Rich Event Semantics in Knowledge Graphs
Comments: Accepted at WebSci'24, 11 pages, 4 figures
Subjects: Computation and Language (cs.CL)

Our world is shaped by events of various complexity. This includes both small-scale local events like local farmer markets and large complex events like political and military conflicts. The latter are typically not observed directly but through the lenses of intermediaries like newspapers or social media. In other words, we do not witness the unfolding of such events directly but are confronted with narratives surrounding them. Such narratives capture different aspects of a complex event and may also differ with respect to the narrator. Thus, they provide a rich semantics concerning real-world events. In this paper, we show how narratives concerning complex events can be constructed and utilized. We provide a formal representation of narratives based on recursive nodes to represent multiple levels of detail and discuss how narratives can be bound to event-centric knowledge graphs. Additionally, we provide an algorithm based on incremental prompting techniques that mines such narratives from texts to account for different perspectives on complex events. Finally, we show the effectiveness and future research directions in a proof of concept.

[192]  arXiv:2404.16406 [pdf, ps, other]
Title: Regular Typed Unification
Comments: 19 pages
Subjects: Logic in Computer Science (cs.LO)

Here we define a new unification algorithm for terms interpreted in semantic domains denoted by a subclass of regular types here called deterministic regular types. This reflects our intention not to handle the semantic universe as a homogeneous collection of values, but instead, to partition it in a way that is similar to data types in programming languages. We first define the new unification algorithm which is based on constraint generation and constraint solving, and then prove its main properties: termination, soundness, and completeness with respect to the semantics. Finally, we discuss how to apply this algorithm to a dynamically typed version of Prolog.

[193]  arXiv:2404.16407 [pdf, other]
Title: U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the field of Automatic Speech Recognition (ASR). Recent works that incorporating MoE into ASR models have complex designs such as routing frames via supplementary embedding network, improving multilingual ability for the experts, and utilizing dedicated auxiliary losses for either expert load balancing or specific language handling. We found that delicate designs are not necessary, while an embarrassingly simple substitution of MoE layers for all Feed-Forward Network (FFN) layers is competent for the ASR task. To be more specific, we benchmark our proposed model on a large scale inner-source dataset (160k hours), the results show that we can scale our baseline Conformer (Dense-225M) to its MoE counterparts (MoE-1B) and achieve Dense-1B level Word Error Rate (WER) while maintaining a Dense-225M level Real Time Factor (RTF). Furthermore, by applying Unified 2-pass framework with bidirectional attention decoders (U2++), we achieve the streaming and non-streaming decoding modes in a single MoE based model, which we call U2++ MoE. We hope that our study can facilitate the research on scaling speech foundation models without sacrificing deployment efficiency.

[194]  arXiv:2404.16408 [pdf, other]
Title: Event-Triggered Resilient Filtering for 2-D Systems with Asynchronous-Delay: Handling Binary Encoding Decoding with Probabilistic Bit Flips
Authors: Yu Chen, Wei Wang
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)

In this paper, the event-triggered resilient filtering problem is investigated for a class of two-dimensional systems with asynchronous-delay under binary encoding-decoding schemes with probabilistic bit flips. To reduce unnecessary communications and computations in complex network systems, alleviate network energy consumption, and optimize the use of network resources, a new event-triggered mechanism is proposed, which focuses on broadcasting necessary measurement information to update innovation only when the event generator function is satisfied. A binary encoding-decoding scheme is used in the communication process to quantify the measurement information into a bit stream, transmit it through a memoryless binary symmetric channel with a certain probability of bit flipping, and restore it at the receiver. In order to utilize the delayed decoded measurement information that a measurement reconstruction approach is proposed. Through generating space equivalence verification, it is found that the reconstructed delay-free decoded measurement sequence contains the same information as the original delayed decoded measurement sequence. In addition, resilient filter is utilized to accommodate possible estimation gain perturbations. Then, a recursive estimator framework is presented based on the reconstructed decoded measurement sequence. By means of the mathematical induction technique, the unbiased property of the proposed estimator is proved. The estimation gain is obtained by minimizing an upper bound on the filtering error covariance. Subsequently, through rigorous mathematical analysis, the monotonicity of filtering performance with respect to triggering parameters is discussed.

[195]  arXiv:2404.16409 [pdf, other]
Title: Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series
Authors: Aimi Okabayashi (IRISA, OBELIX), Nicolas Audebert (CEDRIC - VERTIGO, CNAM, LaSTIG, IGN), Simon Donike (IPL), Charlotte Pelletier (OBELIX, IRISA)
Journal-ref: EARTHVISION 2024 IEEE/CVF CVPR Workshop. Large Scale Computer Vision for Remote Sensing Imagery, Jun 2024, Seattle, United States
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Satellite imaging generally presents a trade-off between the frequency of acquisitions and the spatial resolution of the images. Super-resolution is often advanced as a way to get the best of both worlds. In this work, we investigate multi-image super-resolution of satellite image time series, i.e. how multiple images of the same area acquired at different dates can help reconstruct a higher resolution observation. In particular, we extend state-of-the-art deep single and multi-image super-resolution algorithms, such as SRDiff and HighRes-net, to deal with irregularly sampled Sentinel-2 time series. We introduce BreizhSR, a new dataset for 4x super-resolution of Sentinel-2 time series using very high-resolution SPOT-6 imagery of Brittany, a French region. We show that using multiple images significantly improves super-resolution performance, and that a well-designed temporal positional encoding allows us to perform super-resolution for different times of the series. In addition, we observe a trade-off between spectral fidelity and perceptual quality of the reconstructed HR images, questioning future directions for super-resolution of Earth Observation data.

[196]  arXiv:2404.16411 [pdf, other]
Title: Label-Free Topic-Focused Summarization Using Query Augmentation
Subjects: Artificial Intelligence (cs.AI)

In today's data and information-rich world, summarization techniques are essential in harnessing vast text to extract key information and enhance decision-making and efficiency. In particular, topic-focused summarization is important due to its ability to tailor content to specific aspects of an extended text. However, this usually requires extensive labelled datasets and considerable computational power. This study introduces a novel method, Augmented-Query Summarization (AQS), for topic-focused summarization without the need for extensive labelled datasets, leveraging query augmentation and hierarchical clustering. This approach facilitates the transferability of machine learning models to the task of summarization, circumventing the need for topic-specific training. Through real-world tests, our method demonstrates the ability to generate relevant and accurate summaries, showing its potential as a cost-effective solution in data-rich environments. This innovation paves the way for broader application and accessibility in the field of topic-focused summarization technology, offering a scalable, efficient method for personalized content extraction.

[197]  arXiv:2404.16412 [pdf, ps, other]
Title: Distributed Matrix Pencil Formulations for Prescribed-Time Leader-Following Consensus of MASs with Unknown Sensor Sensitivity
Comments: 10 pages, 1 figure
Subjects: Systems and Control (eess.SY)

In this paper, we address the problem of prescribed-time leader-following consensus of heterogeneous multi-agent systems (MASs) in the presence of unknown sensor sensitivity. Under a connected undirected topology, we propose a time-varying dual observer/controller design framework that makes use of regular local and inaccurate feedback to achieve consensus tracking within a prescribed time. In particular, the developed analysis framework is applicable to MASs equipped with sensors of different sensitivities. One of the design innovations involves constructing a distributed matrix pencil formulation based on worst-case sensors, yielding control parameters with sufficient robustness yet relatively low conservatism. Another novelty is the construction of the control gains, which consists of the product of a proportional coefficient obtained from the matrix pencil formulation and a classic time-varying function that grows to infinity or a novel bounded time-varying function. Furthermore, it is possible to extend the prescribed-time distributed protocol to infinite time domain by introducing the bounded time-varying gain technique without sacrificing the ultimate control accuracy, and the corresponding technical proof is comprehensive. The effectiveness of the method is demonstrated through a group of 5 single-link robot manipulators.

[198]  arXiv:2404.16413 [pdf, other]
Title: Asking and Answering Questions to Extract Event-Argument Structures
Comments: Accepted at LREC-COLING 2024
Subjects: Computation and Language (cs.CL)

This paper presents a question-answering approach to extract document-level event-argument structures. We automatically ask and answer questions for each argument type an event may have. Questions are generated using manually defined templates and generative transformers. Template-based questions are generated using predefined role-specific wh-words and event triggers from the context document. Transformer-based questions are generated using large language models trained to formulate questions based on a passage and the expected answer. Additionally, we develop novel data augmentation strategies specialized in inter-sentential event-argument relations. We use a simple span-swapping technique, coreference resolution, and large language models to augment the training instances. Our approach enables transfer learning without any corpora-specific modifications and yields competitive results with the RAMS dataset. It outperforms previous work, and it is especially beneficial to extract arguments that appear in different sentences than the event trigger. We also present detailed quantitative and qualitative analyses shedding light on the most common errors made by our best model.

[199]  arXiv:2404.16416 [pdf, other]
Title: Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition
Comments: 10 pages, 6 figures, 6 tables, 56 conferences
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Semi-supervised action recognition aims to improve spatio-temporal reasoning ability with a few labeled data in conjunction with a large amount of unlabeled data. Albeit recent advancements, existing powerful methods are still prone to making ambiguous predictions under scarce labeled data, embodied as the limitation of distinguishing different actions with similar spatio-temporal information. In this paper, we approach this problem by empowering the model two aspects of capability, namely discriminative spatial modeling and temporal structure modeling for learning discriminative spatio-temporal representations. Specifically, we propose an Adaptive Contrastive Learning~(ACL) strategy. It assesses the confidence of all unlabeled samples by the class prototypes of the labeled data, and adaptively selects positive-negative samples from a pseudo-labeled sample bank to construct contrastive learning. Additionally, we introduce a Multi-scale Temporal Learning~(MTL) strategy. It could highlight informative semantics from long-term clips and integrate them into the short-term clip while suppressing noisy information. Afterwards, both of these two new techniques are integrated in a unified framework to encourage the model to make accurate predictions. Extensive experiments on UCF101, HMDB51 and Kinetics400 show the superiority of our method over prior state-of-the-art approaches.

[200]  arXiv:2404.16418 [pdf, other]
Title: Instruction Matters, a Simple yet Effective Task Selection Approach in Instruction Tuning for Specific Tasks
Comments: 21 pages, 6 figures, 16 tables
Subjects: Computation and Language (cs.CL)

Instruction tuning has shown its ability to not only enhance zero-shot generalization across various tasks but also its effectiveness in improving the performance of specific tasks. A crucial aspect in instruction tuning for a particular task is a strategic selection of related tasks that offer meaningful supervision, thereby enhancing efficiency and preventing performance degradation from irrelevant tasks. Our research reveals that leveraging instruction information \textit{alone} enables the identification of pertinent tasks for instruction tuning. This approach is notably simpler compared to traditional methods that necessitate complex measurements of pairwise transferability between tasks or the creation of data samples for the target task. Furthermore, by additionally learning the unique instructional template style of the meta-dataset, we observe an improvement in task selection accuracy, which contributes to enhanced overall performance. Experimental results demonstrate that training on a small set of tasks, chosen solely based on the instructions, leads to substantial performance improvements on benchmarks like P3, Big-Bench, NIV2, and Big-Bench Hard. Significantly, these improvements exceed those achieved by prior task selection methods, highlighting the efficacy of our approach.

[201]  arXiv:2404.16421 [pdf, other]
Title: SynCellFactory: Generative Data Augmentation for Cell Tracking
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cell tracking remains a pivotal yet challenging task in biomedical research. The full potential of deep learning for this purpose is often untapped due to the limited availability of comprehensive and varied training data sets. In this paper, we present SynCellFactory, a generative cell video augmentation. At the heart of SynCellFactory lies the ControlNet architecture, which has been fine-tuned to synthesize cell imagery with photorealistic accuracy in style and motion patterns. This technique enables the creation of synthetic yet realistic cell videos that mirror the complexity of authentic microscopy time-lapses. Our experiments demonstrate that SynCellFactory boosts the performance of well-established deep learning models for cell tracking, particularly when original training data is sparse.

[202]  arXiv:2404.16422 [pdf, other]
Title: Robust Fine-tuning for Pre-trained 3D Point Cloud Models
Comments: 9 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents a robust fine-tuning method designed for pre-trained 3D point cloud models, to enhance feature robustness in downstream fine-tuned models. We highlight the limitations of current fine-tuning methods and the challenges of learning robust models. The proposed method, named Weight-Space Ensembles for Fine-Tuning then Linear Probing (WiSE-FT-LP), integrates the original pre-training and fine-tuning models through weight space integration followed by Linear Probing. This approach significantly enhances the performance of downstream fine-tuned models under distribution shifts, improving feature robustness while maintaining high performance on the target distribution. We apply this robust fine-tuning method to mainstream 3D point cloud pre-trained models and evaluate the quality of model parameters and the degradation of downstream task performance. Experimental results demonstrate the effectiveness of WiSE-FT-LP in enhancing model robustness, effectively balancing downstream task performance and model feature robustness without altering the model structures.

[203]  arXiv:2404.16423 [pdf, other]
Title: Neural Assembler: Learning to Generate Fine-Grained Robotic Assembly Instructions from Multi-View Images
Authors: Hongyu Yan, Yadong Mu
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Image-guided object assembly represents a burgeoning research topic in computer vision. This paper introduces a novel task: translating multi-view images of a structural 3D model (for example, one constructed with building blocks drawn from a 3D-object library) into a detailed sequence of assembly instructions executable by a robotic arm. Fed with multi-view images of the target 3D model for replication, the model designed for this task must address several sub-tasks, including recognizing individual components used in constructing the 3D model, estimating the geometric pose of each component, and deducing a feasible assembly order adhering to physical rules. Establishing accurate 2D-3D correspondence between multi-view images and 3D objects is technically challenging. To tackle this, we propose an end-to-end model known as the Neural Assembler. This model learns an object graph where each vertex represents recognized components from the images, and the edges specify the topology of the 3D model, enabling the derivation of an assembly plan. We establish benchmarks for this task and conduct comprehensive empirical evaluations of Neural Assembler and alternative solutions. Our experiments clearly demonstrate the superiority of Neural Assembler.

[204]  arXiv:2404.16429 [pdf, other]
Title: Depth Supervised Neural Surface Reconstruction from Airborne Imagery
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While originally developed for novel view synthesis, Neural Radiance Fields (NeRFs) have recently emerged as an alternative to multi-view stereo (MVS). Triggered by a manifold of research activities, promising results have been gained especially for texture-less, transparent, and reflecting surfaces, while such scenarios remain challenging for traditional MVS-based approaches. However, most of these investigations focus on close-range scenarios, with studies for airborne scenarios still missing. For this task, NeRFs face potential difficulties at areas of low image redundancy and weak data evidence, as often found in street canyons, facades or building shadows. Furthermore, training such networks is computationally expensive. Thus, the aim of our work is twofold: First, we investigate the applicability of NeRFs for aerial image blocks representing different characteristics like nadir-only, oblique and high-resolution imagery. Second, during these investigations we demonstrate the benefit of integrating depth priors from tie-point measures, which are provided during presupposed Bundle Block Adjustment. Our work is based on the state-of-the-art framework VolSDF, which models 3D scenes by signed distance functions (SDFs), since this is more applicable for surface reconstruction compared to the standard volumetric representation in vanilla NeRFs. For evaluation, the NeRF-based reconstructions are compared to results of a publicly available benchmark dataset for airborne images.

[205]  arXiv:2404.16430 [pdf, ps, other]
Title: FO logic on cellular automata orbits equals MSO logic
Authors: Guillaume Theyssier (I2M)
Subjects: Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO); Dynamical Systems (math.DS)

We introduce an extension of classical cellular automata (CA) to arbitrary labeled graphs, and show that FO logic on CA orbits is equivalent to MSO logic. We deduce various results from that equivalence, including a characterization of finitely generated groups on which FO model checking for CA orbits is undecidable, and undecidability of satisfiability of a fixed FO property for CA over finite graphs. We also show concrete examples of FO formulas for CA orbits whose model checking problem is equivalent to the domino problem, or its seeded or recurring variants respectively, on any finitely generated group. For the recurring domino problem, we use an extension of the FO signature by a relation found in the well-known Garden of Eden theorem, but we also show a concrete FO formula without the extension and with one quantifier alternation whose model checking problem does not belong to the arithmetical hierarchy on group Z^2.

[206]  arXiv:2404.16431 [pdf, ps, other]
Title: Secure Coded Distributed Computing
Comments: 6 pages
Subjects: Information Theory (cs.IT)

In this paper, we consider two critical aspects of security in the \textit{distributed computing (DC)} model: \textit{secure data shuffling} and \textit{secure coded computing}. It is imperative that any external entity overhearing the transmissions does not gain any information about the \textit{intermediate values (IVs)} exchanged during the shuffling phase of the DC model. Our approach ensures IV confidentiality during data shuffling. Moreover, each node in the system must be able to recover the IVs necessary for computing its output functions but must also remain oblivious to the IVs associated with output functions not assigned to it. We design secure DC methods and establish achievable limits on the tradeoffs between the communication and computation loads to contribute to the advancement of secure data processing in distributed systems.

[207]  arXiv:2404.16432 [pdf, other]
Title: Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud
Comments: 10 pages, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advancements in self-supervised learning in the point cloud domain have demonstrated significant potential. However, these methods often suffer from drawbacks, including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional modalities. In order to address these issues, we introduce Point-JEPA, a joint embedding predictive architecture designed specifically for point cloud data. To this end, we introduce a sequencer that orders point cloud tokens to efficiently compute and utilize tokens proximity based on their indices during target and context selection. The sequencer also allows shared computations of the tokens proximity between context and target selection, further improving the efficiency. Experimentally, our method achieves competitive results with state-of-the-art methods while avoiding the reconstruction in the input space or additional modality.

[208]  arXiv:2404.16436 [pdf, ps, other]
Title: Leveraging tropical reef, bird and unrelated sounds for superior transfer learning in marine bioacoustics
Comments: 18 pages, 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Machine learning has the potential to revolutionize passive acoustic monitoring (PAM) for ecological assessments. However, high annotation and compute costs limit the field's efficacy. Generalizable pretrained networks can overcome these costs, but high-quality pretraining requires vast annotated libraries, limiting its current applicability primarily to bird taxa. Here, we identify the optimum pretraining strategy for a data-deficient domain using coral reef bioacoustics. We assemble ReefSet, a large annotated library of reef sounds, though modest compared to bird libraries at 2% of the sample count. Through testing few-shot transfer learning performance, we observe that pretraining on bird audio provides notably superior generalizability compared to pretraining on ReefSet or unrelated audio alone. However, our key findings show that cross-domain mixing which leverages bird, reef and unrelated audio during pretraining maximizes reef generalizability. SurfPerch, our pretrained network, provides a strong foundation for automated analysis of marine PAM data with minimal annotation and compute costs.

[209]  arXiv:2404.16442 [pdf, other]
Title: Contextual Categorization Enhancement through LLMs Latent-Space
Journal-ref: Fifteenth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking (COMPUTATION TOOLS 2024), ISSN: 2308-4170
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Managing the semantic quality of the categorization in large textual datasets, such as Wikipedia, presents significant challenges in terms of complexity and cost. In this paper, we propose leveraging transformer models to distill semantic information from texts in the Wikipedia dataset and its associated categories into a latent space. We then explore different approaches based on these encodings to assess and enhance the semantic identity of the categories. Our graphical approach is powered by Convex Hull, while we utilize Hierarchical Navigable Small Worlds (HNSWs) for the hierarchical approach. As a solution to the information loss caused by the dimensionality reduction, we modulate the following mathematical solution: an exponential decay function driven by the Euclidean distances between the high-dimensional encodings of the textual categories. This function represents a filter built around a contextual category and retrieves items with a certain Reconsideration Probability (RP). Retrieving high-RP items serves as a tool for database administrators to improve data groupings by providing recommendations and identifying outliers within a contextual framework.

[210]  arXiv:2404.16443 [pdf, ps, other]
Title: Tightening I/O Lower Bounds through the Hourglass Dependency Pattern
Authors: Lionel Eyraud-Dubois (TOPAL), Guillaume Iooss (CORSE), Julien Langou (ROMA), Fabrice Rastello (CORSE)
Journal-ref: 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '24), Jun 2024, Nantes, France
Subjects: Computational Complexity (cs.CC); Distributed, Parallel, and Cluster Computing (cs.DC)

When designing an algorithm, one cares about arithmetic/computational complexity, but data movement (I/O) complexity plays an increasingly important role that highly impacts performance and energy consumption. For a given algorithm and a given I/O model, scheduling strategies such as loop tiling can reduce the required I/O down to a limit, called the I/O complexity, inherent to the algorithm itself. The objective of I/O complexity analysis is to compute, for a given program, its minimal I/O requirement among all valid schedules. We consider a sequential execution model with two memories, an infinite one, and a small one of size S on which the computations retrieve and produce data. The I/O is the number of reads and writes between the two memories. We identify a common "hourglass pattern" in the dependency graphs of several common linear algebra kernels. Using the properties of this pattern, we mathematically prove tighter lower bounds on their I/O complexity, which improves the previous state-of-the-art bound by a parametric ratio. This proof was integrated inside the IOLB automatic lower bound derivation tool.

[211]  arXiv:2404.16444 [pdf, other]
Title: Automating the Discovery of Partial Differential Equations in Dynamical Systems
Comments: 18 pages, 6 figures, 1 table
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Applications (stat.AP); Machine Learning (stat.ML)

Identifying partial differential equations (PDEs) from data is crucial for understanding the governing mechanisms of natural phenomena, yet it remains a challenging task. We present an extension to the ARGOS framework, ARGOS-RAL, which leverages sparse regression with the recurrent adaptive lasso to identify PDEs from limited prior knowledge automatically. Our method automates calculating partial derivatives, constructing a candidate library, and estimating a sparse model. We rigorously evaluate the performance of ARGOS-RAL in identifying canonical PDEs under various noise levels and sample sizes, demonstrating its robustness in handling noisy and non-uniformly distributed data. We also test the algorithm's performance on datasets consisting solely of random noise to simulate scenarios with severely compromised data quality. Our results show that ARGOS-RAL effectively and reliably identifies the underlying PDEs from data, outperforming the sequential threshold ridge regression method in most cases. We highlight the potential of combining statistical methods, machine learning, and dynamical systems theory to automatically discover governing equations from collected data, streamlining the scientific modeling process.

[212]  arXiv:2404.16446 [pdf, other]
Title: On Software Ageing Indicators in OpenStack
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Distributed systems in general and cloud systems in particular, are susceptible to failures that can lead to substantial economic and data losses, security breaches, and even potential threats to human safety. Software ageing is an example of one such vulnerability. It emerges due to routine re-usage of computational systems units which induce fatigue within the components, resulting in an increased failure rate and potential system breakdown. Due to its stochastic nature, ageing cannot be directly measured, instead ageing indicators as proxies are used. While there are dozens of studies on different ageing indicators, their comprehensive comparison in different settings remains underexplored. In this paper, we compare two ageing indicators in OpenStack as a use case. Specifically, our evaluation compares memory usage (including swap memory) and request response time, as readily available indicators. By executing multiple OpenStack deployments with varying configurations, we conduct a series of experiments and analyze the ageing indicators. Comparative analysis through statistical tests provides valuable insights into the strengths and weaknesses of the utilised ageing indicators. Finally, through an in-depth analysis of other OpenStack failures, we identify underlying failure patterns and their impact on the studied ageing indicators.

[213]  arXiv:2404.16451 [pdf, other]
Title: Latent Modulated Function for Computational Optimal Continuous Image Representation
Authors: Zongyao He, Zhi Jin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The recent work Local Implicit Image Function (LIIF) and subsequent Implicit Neural Representation (INR) based works have achieved remarkable success in Arbitrary-Scale Super-Resolution (ASSR) by using MLP to decode Low-Resolution (LR) features. However, these continuous image representations typically implement decoding in High-Resolution (HR) High-Dimensional (HD) space, leading to a quadratic increase in computational cost and seriously hindering the practical applications of ASSR. To tackle this problem, we propose a novel Latent Modulated Function (LMF), which decouples the HR-HD decoding process into shared latent decoding in LR-HD space and independent rendering in HR Low-Dimensional (LD) space, thereby realizing the first computational optimal paradigm of continuous image representation. Specifically, LMF utilizes an HD MLP in latent space to generate latent modulations of each LR feature vector. This enables a modulated LD MLP in render space to quickly adapt to any input feature vector and perform rendering at arbitrary resolution. Furthermore, we leverage the positive correlation between modulation intensity and input image complexity to design a Controllable Multi-Scale Rendering (CMSR) algorithm, offering the flexibility to adjust the decoding efficiency based on the rendering precision. Extensive experiments demonstrate that converting existing INR-based ASSR methods to LMF can reduce the computational cost by up to 99.9%, accelerate inference by up to 57 times, and save up to 76% of parameters, while maintaining competitive performance. The code is available at https://github.com/HeZongyao/LMF.

[214]  arXiv:2404.16452 [pdf, other]
Title: PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent of their appearance, shape, size, quantity, and location. Semantic independence indicates that adversarial patches operate autonomously within their semantic context, while spatial heterogeneity manifests as distinct image quality of the patch area that differs from original clean image due to the independent generation process. Based on these observations, we propose PAD, a novel adversarial patch localization and removal method that does not require prior knowledge or additional training. PAD offers patch-agnostic defense against various adversarial patches, compatible with any pre-trained object detectors. Our comprehensive digital and physical experiments involving diverse patch types, such as localized noise, printable, and naturalistic patches, exhibit notable improvements over state-of-the-art works. Our code is available at https://github.com/Lihua-Jing/PAD.

[215]  arXiv:2404.16455 [pdf, other]
Title: Canonical Decision Diagrams Modulo Theories
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

Decision diagrams (DDs) are powerful tools to represent effectively propositional formulas, which are largely used in many domains, in particular in formal verification and in knowledge compilation. Some forms of DDs (e.g., OBDDs, SDDs) are canonical, that is, (under given conditions on the atom list) they univocally represent equivalence classes of formulas. Given the limited expressiveness of propositional logic, a few attempts to leverage DDs to SMT level have been presented in the literature. Unfortunately, these techniques still suffer from some limitations: most procedures are theory-specific; some produce theory DDs (T-DDs) which do not univocally represent T-valid formulas or T-inconsistent formulas; none of these techniques provably produces theory-canonical T-DDs, which (under given conditions on the T-atom list) univocally represent T-equivalence classes of formulas. Also, these procedures are not easy to implement, and very few implementations are actually available. In this paper, we present a novel very-general technique to leverage DDs to SMT level, which has several advantages: it is very easy to implement on top of an AllSMT solver and a DD package, which are used as blackboxes; it works for every form of DDs and every theory, or combination thereof, supported by the AllSMT solver; it produces theory-canonical T-DDs if the propositional DD is canonical. We have implemented a prototype tool for both T-OBDDs and T-SDDs on top of OBDD and SDD packages and the MathSAT SMT solver. Some preliminary empirical evaluation supports the effectiveness of the approach.

[216]  arXiv:2404.16456 [pdf, other]
Title: Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. Most MSA efforts are based on the assumption of modality completeness. However, in real-world applications, some practical factors cause uncertain modality missingness, which drastically degrades the model's performance. To this end, we propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the MSA task under uncertain missing modalities. Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics. Moreover, a category-guided prototype distillation mechanism is introduced to capture cross-category correlations using category prototypes to align feature distributions and generate favorable joint representations. Eventually, we design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network through response disentanglement and mutual information maximization. Comprehensive experiments on three datasets indicate that our framework can achieve favorable improvements compared with several baselines.

[217]  arXiv:2404.16457 [pdf, other]
Title: Towards Precise Observations of Neural Model Robustness in Classification
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)

In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data, which could lead to potential safety hazards, especially in safety-critical applications. Pre-deployment assessment of model robustness is essential, but existing methods often suffer from either high costs or imprecise results. To enhance safety in real-world scenarios, metrics that effectively capture the model's robustness are needed. To address this issue, we compare the rigour and usage conditions of various assessment methods based on different definitions. Then, we propose a straightforward and practical metric utilizing hypothesis testing for probabilistic robustness and have integrated it into the TorchAttacks library. Through a comparative analysis of diverse robustness assessment methods, our approach contributes to a deeper understanding of model robustness in safety-critical applications.

[218]  arXiv:2404.16461 [pdf, other]
Title: Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums
Subjects: Computation and Language (cs.CL)

Mental health in children and adolescents has been steadily deteriorating over the past few years [ 1 ]. The recent advent of Large Language Models (LLMs) offers much hope for cost and time efficient scaling of monitoring and intervention, yet despite specifically prevalent issues such as school bullying and eating disorders, previous studies on have not investigated performance in this domain or for open information extraction where the set of answers is not predetermined. We create a new dataset of Reddit posts from adolescents aged 12-19 annotated by expert psychiatrists for the following categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT and compare expert labels to annotations from two top performing LLMs (GPT3.5 and GPT4). In addition, we create two synthetic datasets to assess whether LLMs perform better when annotating data as they generate it. We find GPT4 to be on par with human inter-annotator agreement and performance on synthetic data to be substantially higher, however we find the model still occasionally errs on issues of negation and factuality and higher performance on synthetic data is driven by greater complexity of real data rather than inherent advantage.

[219]  arXiv:2404.16462 [pdf, other]
Title: Blockchain-enabled Energy Trading and Battery-based Sharing in Microgrids
Comments: 6 pages, 3 figures, 2 tables. Accepted to be published in the IEEE 2024 International Conference on Communications (ICC)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Carbon footprint reduction can be achieved through various methods, including the adoption of renewable energy sources. The installation of such sources, like photovoltaic panels, while environmentally beneficial, is cost-prohibitive for many. Those lacking photovoltaic solutions typically resort to purchasing energy from utility grids that often rely on fossil fuels. Moreover, when users produce their own energy, they may generate excess that goes unused, leading to inefficiencies. To address these challenges, this paper proposes innovative blockchain-enabled energy-sharing algorithms that allow consumers -- without financial means -- to access energy through the use of their own energy storage units. We explore two sharing models: a centralized method and a peer-to-peer (P2P) one. Our analysis reveals that the P2P model is more effective, enhancing the sharing process significantly compared to the centralized method. We also demonstrate that, when contrasted with traditional battery-supported trading algorithm, the P2P sharing algorithm substantially reduces wasted energy and energy purchases from the grid by 73.6%, and 12.3% respectively. The proposed system utilizes smart contracts to decentralize its structure, address the single point of failure concern, improve overall system transparency, and facilitate peer-to-peer payments.

[220]  arXiv:2404.16464 [pdf, other]
Title: Sublinear-Time Opinion Estimation in the Friedkin--Johnsen Model
Comments: To appear at the 2024 ACM Web Conference
Subjects: Social and Information Networks (cs.SI)

Online social networks are ubiquitous parts of modern societies and the discussions that take place in these networks impact people's opinions on diverse topics, such as politics or vaccination. One of the most popular models to formally describe this opinion formation process is the Friedkin--Johnsen (FJ) model, which allows to define measures, such as the polarization and the disagreement of a network. Recently, Xu, Bao and Zhang (WebConf'21) showed that all opinions and relevant measures in the FJ model can be approximated in near-linear time. However, their algorithm requires the entire network and the opinions of all nodes as input. Given the sheer size of online social networks and increasing data-access limitations, obtaining the entirety of this data might, however, be unrealistic in practice. In this paper, we show that node opinions and all relevant measures, like polarization and disagreement, can be efficiently approximated in time that is sublinear in the size of the network. Particularly, our algorithms only require query-access to the network and do not have to preprocess the graph. Furthermore, we use a connection between FJ opinion dynamics and personalized PageRank, and show that in $d$-regular graphs, we can deterministically approximate each node's opinion by only looking at a constant-size neighborhood, independently of the network size. We also experimentally validate that our estimation algorithms perform well in practice.

[221]  arXiv:2404.16468 [pdf, other]
Title: A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Model-free reinforcement learning methods lack an inherent mechanism to impose behavioural constraints on the trained policies. While certain extensions exist, they remain limited to specific types of constraints, such as value constraints with additional reward signals or visitation density constraints. In this work we try to unify these existing techniques and bridge the gap with classical optimization and control theory, using a generic primal-dual framework for value-based and actor-critic reinforcement learning methods. The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy, as an intrinsic relationship between such dual constraints (or regularization terms) and reward modifications in the primal is reveiled. Furthermore, using this framework, we are able to introduce some novel types of constraints, allowing to impose bounds on the policy's action density or on costs associated with transitions between consecutive states and actions. From the adjusted primal-dual optimization problems, a practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training using trainable reward modifications. The resulting $\texttt{DualCRL}$ method is examined in more detail and evaluated under different (combinations of) constraints on two interpretable environments. The results highlight the efficacy of the method, which ultimately provides the designer of such systems with a versatile toolbox of possible policy constraints.

[222]  arXiv:2404.16471 [pdf, other]
Title: COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a generic algorithm for scoring pose estimation methods that rely on single image semantic analysis. The algorithm employs a lightweight putative shape representation using a combination of multiple Gaussian Processes. Each Gaussian Process (GP) yields distance normal distributions from multiple reference points in the object's coordinate system to its surface, thus providing a geometric evaluation framework for scoring predicted poses. Our confidence measure comprises the average mixture probability of pixel back-projections onto the shape template. In the reported experiments, we compare the accuracy of our GP based representation of objects versus the actual geometric models and demonstrate the ability of our method to capture the influence of outliers as opposed to the corresponding intrinsic measures that ship with the segmentation and pose estimation methods.

[223]  arXiv:2404.16473 [pdf, other]
Title: Impact of spatial auditory navigation on user experience during augmented outdoor navigation tasks
Subjects: Human-Computer Interaction (cs.HC)

The auditory sense of humans is important when it comes to navigation. The importance is especially high in cases when an object of interest is visually partly or fully covered. Interactions with users of technology are mainly focused on the visual domain of navigation tasks. This paper presents the results of a literature review and user study exploring the impact of spatial auditory navigation on user experience during an augmented outdoor navigation task. For the user test, participants used an augmented reality app guiding them to different locations with different digital augmentation. We conclude that the utilization of the auditory sense is yet still underrepresented in augmented reality applications. In the future, more usage scenarios for audio-augmented reality such as navigation will enhance user experience and interaction quality.

[224]  arXiv:2404.16474 [pdf, other]
Title: DiffSeg: A Segmentation Model for Skin Lesions Based on Diffusion Difference
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Weakly supervised medical image segmentation (MIS) using generative models is crucial for clinical diagnosis. However, the accuracy of the segmentation results is often limited by insufficient supervision and the complex nature of medical imaging. Existing models also only provide a single outcome, which does not allow for the measurement of uncertainty. In this paper, we introduce DiffSeg, a segmentation model for skin lesions based on diffusion difference which exploits diffusion model principles to ex-tract noise-based features from images with diverse semantic information. By discerning difference between these noise features, the model identifies diseased areas. Moreover, its multi-output capability mimics doctors' annotation behavior, facilitating the visualization of segmentation result consistency and ambiguity. Additionally, it quantifies output uncertainty using Generalized Energy Distance (GED), aiding interpretability and decision-making for physicians. Finally, the model integrates outputs through the Dense Conditional Random Field (DenseCRF) algorithm to refine the segmentation boundaries by considering inter-pixel correlations, which improves the accuracy and optimizes the segmentation results. We demonstrate the effectiveness of DiffSeg on the ISIC 2018 Challenge dataset, outperforming state-of-the-art U-Net-based methods.

[225]  arXiv:2404.16478 [pdf, other]
Title: Evaluating Consistency and Reasoning Capabilities of Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance, for tasks such as text generation, summarization, and translation. Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate. This behavior can be attributed to several factors, with consistency and reasoning capabilities being significant contributors. LLMs frequently lack the ability to generate explanations and engage in coherent reasoning, leading to inaccurate responses. Moreover, they exhibit inconsistencies in their outputs. This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs. The experiments utilize the Boolq dataset as the ground truth, comprising questions, answers, and corresponding explanations. Queries from the dataset are presented as prompts to the LLMs, and the generated responses are evaluated against the ground truth answers. Additionally, explanations are generated to assess the models' reasoning abilities. Consistency is evaluated by repeatedly presenting the same query to the models and observing for variations in their responses. For measuring reasoning capabilities, the generated explanations are compared to the ground truth explanations using metrics such as BERT, BLEU, and F-1 scores. The findings reveal that proprietary models generally outperform public models in terms of both consistency and reasoning capabilities. However, even when presented with basic general knowledge questions, none of the models achieved a score of 90\% in both consistency and reasoning. This study underscores the direct correlation between consistency and reasoning abilities in LLMs and highlights the inherent reasoning challenges present in current language models.

[226]  arXiv:2404.16479 [pdf, other]
Title: The Impact of Social Environment and Interaction Focus on User Experience and Social Acceptability of an Augmented Reality Game
Subjects: Human-Computer Interaction (cs.HC)

One of the most promising technologies inside the Extended Reality (XR) spectrum is Augmented Reality. This technology is already in people's pockets regarding Mobile Augmented Reality with their smartphones. The scientific community still needs answers about how humans could and should interact in environments where perceived stimuli are different from fully physical or digital circumstances. Moreover, it is still being determined if people accept these new technologies in different social environments and interaction settings or if some obstacles could exist. This paper explores the impact of the Social Environment and the Focus of social interaction on users while playing a location-based augmented reality game, measuring it with user experience and social acceptance indicators. An empirical study in a within-subject fashion was performed in different social environments and under different settings of social interaction focus with N = 28 participants compiling self-reported questionnaires after playing a Scavenger Hunt in Augmented Reality. The measures from two different Social Environments (Crowded vs. Uncrowded) resulted in statistically relevant mean differences with indicators from the Social Acceptability dimension. Moreover, the analyses show statistically relevant differences between the variances from different degrees of Social Interaction Focus with Overall Social Presence, Perceived Psychological Engagement, Perceived Attentional Engagement, and Perceived Emotional Contagion. The results suggest that a location-based AR game played in different social environments and settings can influence the user experience's social dimension. Therefore, they should be carefully considered while designing immersive technological experiences in public spaces involving social interactions between players.

[227]  arXiv:2404.16483 [pdf, other]
Title: Leveraging Pretrained Latent Representations for Few-Shot Imitation Learning on a Dexterous Robotic Hand
Subjects: Robotics (cs.RO)

In the context of imitation learning applied to dexterous robotic hands, the high complexity of the systems makes learning complex manipulation tasks challenging. However, the numerous datasets depicting human hands in various different tasks could provide us with better knowledge regarding human hand motion. We propose a method to leverage multiple large-scale task-agnostic datasets to obtain latent representations that effectively encode motion subtrajectories that we included in a transformer-based behavior cloning method. Our results demonstrate that employing latent representations yields enhanced performance compared to conventional behavior cloning methods, particularly regarding resilience to errors and noise in perception and proprioception. Furthermore, the proposed approach solely relies on human demonstrations, eliminating the need for teleoperation and, therefore, accelerating the data acquisition process. Accurate inverse kinematics for fingertip retargeting ensures precise transfer from human hand data to the robot, facilitating effective learning and deployment of manipulation policies. Finally, the trained policies have been successfully transferred to a real-world 23Dof robotic system.

[228]  arXiv:2404.16484 [pdf, other]
Title: Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
Comments: CVPR 2024, AI for Streaming (AIS) Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.

[229]  arXiv:2404.16486 [pdf, other]
Title: OpenIVM: a SQL-to-SQL Compiler for Incremental Computations
Subjects: Databases (cs.DB)

This demonstration presents a new Open Source SQL-to-SQL compiler for Incremental View Maintenance (IVM). While previous systems, such as DBToaster, implemented computational functionality for IVM in a separate system, the core principle of OpenIVM is to make use of existing SQL query processing engines and perform all IVM computations via SQL. This approach enables the integration of IVM in these systems without code duplication. Also, it eases its use in cross-system IVM, i.e. to orchestrate an HTAP system in which one (OLTP) DBMS provides insertions/updates/deletes (deltas), which are propagated using SQL into another (OLAP) DBMS, hosting materialized views. Our system compiles view definitions into SQL to eventually propagate deltas into the table that materializes the view, following the principles of DBSP. Under the hood, OpenIVM uses the DuckDB library to compile (parse, transform, optimize) the materialized view maintenance logic. We demonstrate OpenIVM in action (i) as the core of a DuckDB extension module that adds IVM functionality to it and (ii) powering cross-system IVM for HTAP, with PostgreSQL handling updates on base tables and DuckDB hosting materialized views on these.

[230]  arXiv:2404.16487 [pdf, other]
Title: Comparing Continuous and Retrospective Emotion Ratings in Remote VR Study
Comments: The paper is accepted and will be presented at QoMEX 2024
Subjects: Human-Computer Interaction (cs.HC)

This study investigates the feasibility of remote virtual reality (VR) studies conducted at home using VR headsets and video conferencing by deploying an experiment on emotion ratings. 20 participants used head-mounted displays to immerse themselves in 360{\deg} videos selected to evoke emotional responses. The research compares continuous ratings using a graphical interface to retrospective questionnaires on a digitized Likert Scale for measuring arousal and valence, both based on the self-assessment manikin (SAM). It was hypothesized that the two different rating methods would lead to significantly different values for both valence and arousal. The goal was to investigate whether continuous ratings during the experience would better reflect users' emotions compared to the post-questionnaire by mitigating biases such as the peak-end rule. The results show significant differences with moderate to strong effect sizes for valence and no significant differences for arousal with low to moderate effect sizes. This indicates the need for further investigation of the methods used to assess emotion ratings in VR studies. Overall, this study is an example of a remotely conducted VR experiment, offering insights into methods for emotion elicitation in VR by varying the timing and interface of the rating.

[231]  arXiv:2404.16489 [pdf, other]
Title: Cost-Driven Data Replication with Predictions
Comments: The formal version of this draft will appear in ACM SPAA'24 conference
Subjects: Data Structures and Algorithms (cs.DS)

This paper studies an online replication problem for distributed data access. The goal is to dynamically create and delete data copies in a multi-server system as time passes to minimize the total storage and network cost of serving access requests. We study the problem in the emergent learning-augmented setting, assuming simple binary predictions about inter-request times at individual servers. We develop an online algorithm and prove that it is ($\frac{5+\alpha}{3}$)-consistent (competitiveness under perfect predictions) and ($1 + \frac{1}{\alpha}$)-robust (competitiveness under terrible predictions), where $\alpha \in (0, 1]$ is a hyper-parameter representing the level of distrust in the predictions. We also study the impact of mispredictions on the competitive ratio of the proposed algorithm and adapt it to achieve a bounded robustness while retaining its consistency. We further establish a lower bound of $\frac{3}{2}$ on the consistency of any deterministic learning-augmented algorithm. Experimental evaluations are carried out to evaluate our algorithms using real data access traces.

[232]  arXiv:2404.16492 [pdf, ps, other]
Title: On the topology of concurrent systems
Comments: 24 pages
Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO); Algebraic Topology (math.AT)

Higher-dimensional automata, i.e., pointed labeled precubical sets, are a powerful combinatorial-topological model for concurrent systems. In this paper, we show that for every (nonempty) connected polyhedron there exists a shared-variable system such that the higher-dimensional automaton modeling the state space of the system has the homotopy type of the polyhedron.

[233]  arXiv:2404.16493 [pdf, other]
Title: Commonsense Prototype for Outdoor Unsupervised 3D Object Detection
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object detection. CPD first constructs Commonsense Prototype (CProto) characterized by high-quality bounding box and dense points, based on commonsense intuition. Subsequently, CPD refines the low-quality pseudo-labels by leveraging the size prior from CProto. Furthermore, CPD enhances the detection accuracy of sparsely scanned objects by the geometric knowledge from CProto. CPD outperforms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin. Besides, by training CPD on WOD and testing on KITTI, CPD attains 90.85% and 81.01% 3D Average Precision on easy and moderate car classes, respectively. These achievements position CPD in close proximity to fully supervised detectors, highlighting the significance of our method. The code will be available at https://github.com/hailanyi/CPD.

[234]  arXiv:2404.16495 [pdf, other]
Title: T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients
Comments: 15 pages and 4 figures
Subjects: Machine Learning (cs.LG)

The development of machine learning applications has increased significantly in recent years, motivated by the remarkable ability of learning-powered systems to discover and generalize intricate patterns hidden in massive datasets. Modern learning models, while powerful, often exhibit a level of complexity that renders them opaque black boxes, resulting in a notable lack of transparency that hinders our ability to decipher their decision-making processes. Opacity challenges the interpretability and practical application of machine learning, especially in critical domains where understanding the underlying reasons is essential for informed decision-making. Explainable Artificial Intelligence (XAI) rises to meet that challenge, unraveling the complexity of black boxes by providing elucidating explanations. Among the various XAI approaches, feature attribution/importance XAI stands out for its capacity to delineate the significance of input features in the prediction process. However, most existing attribution methods have limitations, such as instability, when divergent explanations may result from similar or even the same instance. In this work, we introduce T-Explainer, a novel local additive attribution explainer based on Taylor expansion endowed with desirable properties, such as local accuracy and consistency, while stable over multiple runs. We demonstrate T-Explainer's effectiveness through benchmark experiments with well-known attribution methods. In addition, T-Explainer is developed as a comprehensive XAI framework comprising quantitative metrics to assess and visualize attribution explanations.

[235]  arXiv:2404.16496 [pdf, other]
Title: Probabilistic Multi-Layer Perceptrons for Wind Farm Condition Monitoring
Comments: 9 pages, 8 figures, 2 tables
Subjects: Machine Learning (cs.LG); Applications (stat.AP)

We provide a condition monitoring system for wind farms, based on normal behaviour modelling using a probabilistic multi-layer perceptron with transfer learning via fine-tuning. The model predicts the output power of the wind turbine under normal behaviour based on features retrieved from supervisory control and data acquisition (SCADA) systems. Its advantages are that (i) it can be trained with SCADA data of at least a few years, (ii) it can incorporate all SCADA data of all wind turbines in a wind farm as features, (iii) it assumes that the output power follows a normal density with heteroscedastic variance and (iv) it can predict the output of one wind turbine by borrowing strength from the data of all other wind turbines in a farm. Probabilistic guidelines for condition monitoring are given via a CUSUM control chart. We illustrate the performance of our model in a real SCADA data example which provides evidence that it outperforms other probabilistic prediction models.

[236]  arXiv:2404.16500 [pdf, other]
Title: Conformal Prediction of Motion Control Performance for an Automated Vehicle in Presence of Actuator Degradations and Failures
Comments: submitted for publication
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Automated driving systems require monitoring mechanisms to ensure safe operation, especially if system components degrade or fail. Their runtime self-representation plays a key role as it provides a-priori knowledge about the system's capabilities and limitations. In this paper, we propose a data-driven approach for deriving such a self-representation model for the motion controller of an automated vehicle. A conformalized prediction model is learned and allows estimating how operational conditions as well as potential degradations and failures of the vehicle's actuators impact motion control performance. During runtime behavior generation, our predictor can provide a heuristic for determining the admissible action space.

[237]  arXiv:2404.16501 [pdf, other]
Title: 360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes
Comments: arXiv admin note: substantial text overlap with arXiv:2403.12505
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we address the challenging source-free unsupervised domain adaptation (SFUDA) for pinhole-to-panoramic semantic segmentation, given only a pinhole image pre-trained model (i.e., source) and unlabeled panoramic images (i.e., target). Tackling this problem is non-trivial due to three critical challenges: 1) semantic mismatches from the distinct Field-of-View (FoV) between domains, 2) style discrepancies inherent in the UDA problem, and 3) inevitable distortion of the panoramic images. To tackle these problems, we propose 360SFUDA++ that effectively extracts knowledge from the source pinhole model with only unlabeled panoramic images and transfers the reliable knowledge to the target panoramic domain. Specifically, we first utilize Tangent Projection (TP) as it has less distortion and meanwhile slits the equirectangular projection (ERP) to patches with fixed FoV projection (FFP) to mimic the pinhole images. Both projections are shown effective in extracting knowledge from the source model. However, as the distinct projections make it less possible to directly transfer knowledge between domains, we then propose Reliable Panoramic Prototype Adaptation Module (RP2AM) to transfer knowledge at both prediction and prototype levels. RP$^2$AM selects the confident knowledge and integrates panoramic prototypes for reliable knowledge adaptation. Moreover, we introduce Cross-projection Dual Attention Module (CDAM), which better aligns the spatial and channel characteristics across projections at the feature level between domains. Both knowledge extraction and transfer processes are synchronously updated to reach the best performance. Extensive experiments on the synthetic and real-world benchmarks, including outdoor and indoor scenarios, demonstrate that our 360SFUDA++ achieves significantly better performance than prior SFUDA methods.

[238]  arXiv:2404.16502 [pdf, other]
Title: A Prototypical Expert-Driven Approach Towards Capability-Based Monitoring of Automated Driving Systems
Comments: submitted for publication
Subjects: Systems and Control (eess.SY)

Supervising the safe operation of automated vehicles is a key requirement in order to unleash their full potential in future transportation systems. In particular, previous publications have argued that SAE Level 4 vehicles should be aware of their capabilities at runtime to make appropriate behavioral decisions. In this paper, we present a framework that enables the implementation of an online capability monitor. We derive a graphical system model that captures the relationships between the quality of system elements across different architectural views. In an expert-driven approach, we parameterize Bayesian Networks based on this structure using Fuzzy Logic. Using the online monitor, we infer the quality of the system's capabilities based on technical measurements acquired at runtime. Our approach is demonstrated in the context of the UNICAR.agil research project in an urban example scenario.

[239]  arXiv:2404.16504 [pdf, ps, other]
Title: Hardware Implementation of Double Pendulum Pseudo Random Number Generator
Comments: 15 pages, 12 figure
Subjects: Cryptography and Security (cs.CR); Signal Processing (eess.SP)

The objective of this project is to utilize an FPGA board which is the CMOD A7 35t to obtain a pseudo random number which can be used for encryption. We aim to achieve this by leveraging the inherent randomness present in environmental data captured by sensors. This data will be used as a seed to initialize an algorithm implemented on the CMOD A7 35t FPGA board. The project will focus on interfacing the sensors with the FPGA and developing suitable algorithms to ensure the generated numbers exhibit strong randomness properties.

[240]  arXiv:2404.16505 [pdf, other]
Title: Efficient algorithms for regularized Poisson Non-negative Matrix Factorization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

We consider the problem of regularized Poisson Non-negative Matrix Factorization (NMF) problem, encompassing various regularization terms such as Lipschitz and relatively smooth functions, alongside linear constraints. This problem holds significant relevance in numerous Machine Learning applications, particularly within the domain of physical linear unmixing problems. A notable challenge arises from the main loss term in the Poisson NMF problem being a KL divergence, which is non-Lipschitz, rendering traditional gradient descent-based approaches inefficient. In this contribution, we explore the utilization of Block Successive Upper Minimization (BSUM) to overcome this challenge. We build approriate majorizing function for Lipschitz and relatively smooth functions, and show how to introduce linear constraints into the problem. This results in the development of two novel algorithms for regularized Poisson NMF. We conduct numerical simulations to showcase the effectiveness of our approach.

[241]  arXiv:2404.16506 [pdf, other]
Title: Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer
Comments: Accepted LREC-COLING 2024
Subjects: Computation and Language (cs.CL)

Document-level Relation Extraction (DocRE) is the task of extracting all semantic relationships from a document. While studies have been conducted on English DocRE, limited attention has been given to DocRE in non-English languages. This work delves into effectively utilizing existing English resources to promote DocRE studies in non-English languages, with Japanese as the representative case. As an initial attempt, we construct a dataset by transferring an English dataset to Japanese. However, models trained on such a dataset suffer from low recalls. We investigate the error cases and attribute the failure to different surface structures and semantics of documents translated from English and those written by native speakers. We thus switch to explore if the transferred dataset can assist human annotation on Japanese documents. In our proposal, annotators edit relation predictions from a model trained on the transferred dataset. Quantitative analysis shows that relation recommendations suggested by the model help reduce approximately 50% of the human edit steps compared with the previous approach. Experiments quantify the performance of existing DocRE models on our collected dataset, portraying the challenges of Japanese and cross-lingual DocRE.

[242]  arXiv:2404.16507 [pdf, other]
Title: Semantic-aware Next-Best-View for Multi-DoFs Mobile System in Search-and-Acquisition based Visual Perception
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Efficient visual perception using mobile systems is crucial, particularly in unknown environments such as search and rescue operations, where swift and comprehensive perception of objects of interest is essential. In such real-world applications, objects of interest are often situated in complex environments, making the selection of the 'Next Best' view based solely on maximizing visibility gain suboptimal. Semantics, providing a higher-level interpretation of perception, should significantly contribute to the selection of the next viewpoint for various perception tasks. In this study, we formulate a novel information gain that integrates both visibility gain and semantic gain in a unified form to select the semantic-aware Next-Best-View. Additionally, we design an adaptive strategy with termination criterion to support a two-stage search-and-acquisition manoeuvre on multiple objects of interest aided by a multi-degree-of-freedoms (Multi-DoFs) mobile system. Several semantically relevant reconstruction metrics, including perspective directivity and region of interest (ROI)-to-full reconstruction volume ratio, are introduced to evaluate the performance of the proposed approach. Simulation experiments demonstrate the advantages of the proposed approach over existing methods, achieving improvements of up to 27.13% for the ROI-to-full reconstruction volume ratio and a 0.88234 average perspective directivity. Furthermore, the planned motion trajectory exhibits better perceiving coverage toward the target.

[243]  arXiv:2404.16508 [pdf, other]
Title: Exploring the Dynamics of Data Transmission in 5G Networks: A Conceptual Analysis
Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI)

This conceptual analysis examines the dynamics of data transmission in 5G networks. It addresses various aspects of sending data from cameras and LiDARs installed on a remote-controlled ferry to a land-based control center. The range of topics includes all stages of video and LiDAR data processing from acquisition and encoding to final decoding, all aspects of their transmission and reception via the WebRTC protocol, and all possible types of network problems such as handovers or congestion that could affect the quality of experience for end-users. A series of experiments were conducted to evaluate the key aspects of the data transmission. These include simulation-based reproducible runs and real-world experiments conducted using open-source solutions we developed: "Gymir5G" - an OMNeT++-based 5G simulation and "GstWebRTCApp" - a GStreamer-based application for adaptive control of media streams over the WebRTC protocol. One of the goals of this study is to formulate the bandwidth and latency requirements for reliable real-time communication and to estimate their approximate values. This goal was achieved through simulation-based experiments involving docking maneuvers in the Bay of Kiel, Germany. The final latency for the entire data processing pipeline was also estimated during the real tests. In addition, a series of simulation-based experiments showed the impact of key WebRTC features and demonstrated the effectiveness of the WebRTC protocol, while the conducted video codec comparison showed that the hardware-accelerated H.264 codec is the best. Finally, the research addresses the topic of adaptive communication, where the traditional congestion avoidance and deep reinforcement learning approaches were analyzed. The comparison in a sandbox scenario shows that the AI-based solution outperforms the WebRTC baseline GCC algorithm in terms of data rates, latency, and packet loss.

[244]  arXiv:2404.16510 [pdf, other]
Title: Interactive3D: Create What You Want by Interactive 3D Generation
Comments: project page: this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

3D object generation has undergone significant advancements, yielding high-quality results. However, fall short of achieving precise user control, often yielding results that do not align with user expectations, thus limiting their applicability. User-envisioning 3D object generation faces significant challenges in realizing its concepts using current generative models due to limited interaction capabilities. Existing methods mainly offer two approaches: (i) interpreting textual instructions with constrained controllability, or (ii) reconstructing 3D objects from 2D images. Both of them limit customization to the confines of the 2D reference and potentially introduce undesirable artifacts during the 3D lifting process, restricting the scope for direct and versatile 3D modifications. In this work, we introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process through extensive 3D interaction capabilities. Interactive3D is constructed in two cascading stages, utilizing distinct 3D representations. The first stage employs Gaussian Splatting for direct user interaction, allowing modifications and guidance of the generative direction at any intermediate step through (i) Adding and Removing components, (ii) Deformable and Rigid Dragging, (iii) Geometric Transformations, and (iv) Semantic Editing. Subsequently, the Gaussian splats are transformed into InstantNGP. We introduce a novel (v) Interactive Hash Refinement module to further add details and extract the geometry in the second stage. Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation. Our project webpage is available at \url{https://interactive-3d.github.io/}.

[245]  arXiv:2404.16514 [pdf, ps, other]
Title: Adaptive Learning-based Model Predictive Control for Uncertain Interconnected Systems: A Set Membership Identification Approach
Subjects: Systems and Control (eess.SY)

We propose a novel adaptive learning-based model predictive control (MPC) scheme for interconnected systems which can be decomposed into several smaller dynamically coupled subsystems with uncertain coupling. The proposed scheme is mainly divided into two main online phases; a learning phase and an adaptation phase. Set membership identification is used in the learning phase to learn an uncertainty set that contains the coupling strength using online data. In the adaptation phase, rigid tube-based robust MPC is used to compute the optimal predicted states and inputs. Besides computing the optimal trajectories, the MPC ingredients are adapted in the adaptation phase taking the learnt uncertainty set into account. These MPC ingredients include the prestabilizing controller, the rigid tube, the tightened constraints and the terminal ingredients. The recursive feasibility of the proposed scheme as well as the stability of the corresponding closed-loop system are discussed. The developed scheme is compared in simulations to existing schemes including robust, adaptive and learning-based MPC.

[246]  arXiv:2404.16517 [pdf, other]
Title: Scalable Distributed String Sorting
Subjects: Data Structures and Algorithms (cs.DS)

String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least) proportional to the number of processors $p$ or communicate the data a large number of times (at least logarithmic). We present practical and efficient algorithms for distributed-memory string sorting that scale to large $p$. Similar to state-of-the-art sorters for atomic objects, the algorithms have latency of about $p^{1/k}$ when allowing the data to be communicated $k$ times. Experiments indicate good scaling behavior on a wide range of inputs on up to 49152 cores. Overall, we achieve speedups of up to 5 over the current state-of-the-art distributed string sorting algorithms.

[247]  arXiv:2404.16518 [pdf, other]
Title: Edit Distance of Finite State Transducers
Subjects: Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)

We lift metrics over words to metrics over word-to-word transductions, by defining the distance between two transductions as the supremum of the distances of their respective outputs over all inputs. This allows to compare transducers beyond equivalence.
Two transducers are close (resp. $k$-close) with respect to a metric if their distance is finite (resp. at most $k$). Over integer-valued metrics computing the distance between transducers is equivalent to deciding the closeness and $k$-closeness problems. For common integer-valued edit distances such as, Hamming, transposition, conjugacy and Levenshtein family of distances, we show that the closeness and the $k$-closeness problems are decidable for functional transducers. Hence, the distance with respect to these metrics is also computable.
Finally, we relate the notion of distance between functions to the notions of diameter of a relation and index of a relation in another. We show that computing edit distance between functional transducers is equivalent to computing diameter of a rational relation and both are a specific instance of the index problem of rational relations.

[248]  arXiv:2404.16519 [pdf, ps, other]
Title: Unbiased Estimating Equation on Inverse Divergence and Its Conditions
Comments: Accepted to the 2024 IEEE International Symposium on Information Theory (ISIT 2024)
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)

This paper focuses on the Bregman divergence defined by the reciprocal function, called the inverse divergence. For the loss function defined by the monotonically increasing function $f$ and inverse divergence, the conditions for the statistical model and function $f$ under which the estimating equation is unbiased are clarified. Specifically, we characterize two types of statistical models, an inverse Gaussian type and a mixture of generalized inverse Gaussian type distributions, to show that the conditions for the function $f$ are different for each model. We also define Bregman divergence as a linear sum over the dimensions of the inverse divergence and extend the results to the multi-dimensional case.

[249]  arXiv:2404.16527 [pdf, ps, other]
Title: Energy Efficient Service Placement for IoT Networks
Comments: 5 pages, 5 Figures, Conference
Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

In recent years, there has been a significant expansion in the Internet of Things (IoT), with a growing number of devices being connected to the internet. This has led to an increase in data collection and analysis as well as the development of new technologies and applications. The rise of IoT has also brought about new challenges, such as security concerns and energy efficiency. This study investigates a layered IoT architecture that combines fog and cloud computing, aiming to assess the impact of service placement on energy efficiency. Through simulations, we analyse energy use across Access Fog, Metro Fog, and Cloud Data Centre layers for different IoT request volumes. Findings indicate that Access Fog is optimal for single requests, while Metro Fog efficiently manages higher demands from multiple devices. The study emphasizes the need for adaptive service deployment, responsive to network load variations, to improve energy efficiency. Hence, we propose the implementation of dynamic service placement strategies within Internet of Things (IoT) environments.

[250]  arXiv:2404.16529 [pdf, other]
Title: Vision-based robot manipulation of transparent liquid containers in a laboratory setting
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Laboratory processes involving small volumes of solutions and active ingredients are often performed manually due to challenges in automation, such as high initial costs, semi-structured environments and protocol variability. In this work, we develop a flexible and cost-effective approach to address this gap by introducing a vision-based system for liquid volume estimation and a simulation-driven pouring method particularly designed for containers with small openings. We evaluate both components individually, followed by an applied real-world integration of cell culture automation using a UR5 robotic arm. Our work is fully reproducible: we share our code at at \url{https://github.com/DaniSchober/LabLiquidVision} and the newly introduced dataset LabLiquidVolume is available at https://data.dtu.dk/articles/dataset/LabLiquidVision/25103102.

[251]  arXiv:2404.16530 [pdf, other]
Title: On the Political Economy of Link-based Web Search
Subjects: Computers and Society (cs.CY)

Web search engines arguably form the most popular data-driven systems in contemporary society. They wield a considerable power by functioning as gatekeepers of the Web, with most user journeys on the Web beginning with them. Starting from the late 1990s, search engines have been dominated by the paradigm of link-based web search. In this paper, we critically analyze the political economy of the paradigm of link-based web search, drawing upon insights and methodologies from critical political economy. We draw several insights on how link-based web search has led to phenomena that favor capital through long-term structural changes on the Web, and how it has led to accentuating unpaid digital labor and ecologically unsustainable practices, among several others. We show how contemporary observations on the degrading quality of link-based web search can be traced back to the internal contradictions with the paradigm, and how such socio-technical phenomena may lead to a disutility of the link-based web search model. Our contribution is primarily on enhancing the understanding of the political economy of link-based web search, and laying bare the phenomena at work, and implicitly catalyze the search for alternative models.

[252]  arXiv:2404.16532 [pdf, other]
Title: Global Concept Explanations for Graphs by Contrastive Learning
Comments: 25 pages, 9 figures, accepted at xAI world conference 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Beyond improving trust and validating model fairness, xAI practices also have the potential to recover valuable scientific insights in application domains where little to no prior human intuition exists. To that end, we propose a method to extract global concept explanations from the predictions of graph neural networks to develop a deeper understanding of the tasks underlying structure-property relationships. We identify concept explanations as dense clusters in the self-explaining Megan models subgraph latent space. For each concept, we optimize a representative prototype graph and optionally use GPT-4 to provide hypotheses about why each structure has a certain effect on the prediction. We conduct computational experiments on synthetic and real-world graph property prediction tasks. For the synthetic tasks we find that our method correctly reproduces the structural rules by which they were created. For real-world molecular property regression and classification tasks, we find that our method rediscovers established rules of thumb. More specifically, our results for molecular mutagenicity prediction indicate more fine-grained resolution of structural details than existing explainability methods, consistent with previous results from chemistry literature. Overall, our results show promising capability to extract the underlying structure-property relationships for complex graph property prediction tasks.

[253]  arXiv:2404.16534 [pdf, other]
Title: SIDEs: Separating Idealization from Deceptive Explanations in xAI
Authors: Emily Sullivan
Comments: 18 pages, 3 figures, 2 tables Forthcoming in FAccT'24
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Explainable AI (xAI) methods are important for establishing trust in using black-box models. However, recent criticism has mounted against current xAI methods that they disagree, are necessarily false, and can be manipulated, which has started to undermine the deployment of black-box models. Rudin (2019) goes so far as to say that we should stop using black-box models altogether in high-stakes cases because xAI explanations "must be wrong". However, strict fidelity to the truth is historically not a desideratum in science. Idealizations -- the intentional distortions introduced to scientific theories and models -- are commonplace in the natural sciences and are seen as a successful scientific tool. Thus, it is not falsehood qua falsehood that is the issue. In this paper, I outline the need for xAI research to engage in idealization evaluation. Drawing on the use of idealizations in the natural sciences and philosophy of science, I introduce a novel framework for evaluating whether xAI methods engage in successful idealizations or deceptive explanations (SIDEs). SIDEs evaluates whether the limitations of xAI methods, and the distortions that they introduce, can be part of a successful idealization or are indeed deceptive distortions as critics suggest. I discuss the role that existing research can play in idealization evaluation and where innovation is necessary. Through a qualitative analysis we find that leading feature importance methods and counterfactual explanations are subject to idealization failure and suggest remedies for ameliorating idealization failure.

[254]  arXiv:2404.16536 [pdf, other]
Title: 3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generative 3D face models featuring disentangled controlling factors hold immense potential for diverse applications in computer vision and computer graphics. However, previous 3D face modeling methods face a challenge as they demand specific labels to effectively disentangle these factors. This becomes particularly problematic when integrating multiple 3D face datasets to improve the generalization of the model. Addressing this issue, this paper introduces a Weakly-Supervised Disentanglement Framework, denoted as WSDF, to facilitate the training of controllable 3D face models without an overly stringent labeling requirement. Adhering to the paradigm of Variational Autoencoders (VAEs), the proposed model achieves disentanglement of identity and expression controlling factors through a two-branch encoder equipped with dedicated identity-consistency prior. It then faithfully re-entangles these factors via a tensor-based combination mechanism. Notably, the introduction of the Neutral Bank allows precise acquisition of subject-specific information using only identity labels, thereby averting degeneration due to insufficient supervision. Additionally, the framework incorporates a label-free second-order loss function for the expression factor to regulate deformation space and eliminate extraneous information, resulting in enhanced disentanglement. Extensive experiments have been conducted to substantiate the superior performance of WSDF. Our code is available at https://github.com/liguohao96/WSDF.

[255]  arXiv:2404.16537 [pdf, other]
Title: Local training and enrichment based on a residual localization strategy
Subjects: Numerical Analysis (math.NA)

To efficiently tackle parametrized multi and/or large scale problems, we propose an adaptive localized model order reduction framework combining both local offline training and local online enrichment with localized error control. For the latter, we adapt the residual localization strategy introduced in [Buhr, Engwer, Ohlberger, Rave, SIAM J. Sci. Comput., 2017] which allows to derive a localized a posteriori error estimator that can be employed to adaptively enrich the reduced solution space locally where needed. Numerical experiments demonstrate the potential of the proposed approach.

[256]  arXiv:2404.16538 [pdf, other]
Title: OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent advances in Vision and Language Models (VLMs) have improved open-world 3D representation, facilitating 3D zero-shot capability in unseen categories. Existing open-world methods pre-train an extra 3D encoder to align features from 3D data (e.g., depth maps or point clouds) with CAD-rendered images and corresponding texts. However, the limited color and texture variations in CAD images can compromise the alignment robustness. Furthermore, the volume discrepancy between pre-training datasets of the 3D encoder and VLM leads to sub-optimal 2D to 3D knowledge transfer. To overcome these issues, we propose OpenDlign, a novel framework for learning open-world 3D representations, that leverages depth-aligned images generated from point cloud-projected depth maps. Unlike CAD-rendered images, our generated images provide rich, realistic color and texture diversity while preserving geometric and semantic consistency with the depth maps. OpenDlign also optimizes depth map projection and integrates depth-specific text prompts, improving 2D VLM knowledge adaptation for 3D learning efficient fine-tuning. Experimental results show that OpenDlign significantly outperforms existing benchmarks in zero-shot and few-shot 3D tasks, exceeding prior scores by 8.0% on ModelNet40 and 16.4% on OmniObject3D with just 6 million tuned parameters. Moreover, integrating generated depth-aligned images into existing 3D learning pipelines consistently improves their performance.

[257]  arXiv:2404.16540 [pdf, other]
Title: Approximation Algorithm of Minimum All-Ones Problem for Arbitrary Graphs
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

Let $G=(V, E)$ be a graph and let each vertex of $G$ has a lamp and a button. Each button can be of $\sigma^+$-type or $\sigma$-type.
Assume that initially some lamps are on and others are off. The button on vertex $x$ is of $\sigma^+$-type ($\sigma$-type, respectively) if pressing the button changes the lamp states on $x$ and on its neighbors in $G$ (the lamp states on the neighbors of $x$ only, respectively). Assume that there is a set $X\subseteq V$ such that pressing buttons on vertices of $X$ lights all lamps on vertices of $G$. In particular, it is known to hold when initially all lamps are off and all buttons are of $\sigma^+$-type.
Finding such a set $X$ of the smallest size is NP-hard even if initially all lamps are off and all buttons are of $\sigma^+$-type. Using a linear algebraic approach we design a polynomial-time approximation algorithm for the problem such that for the set $X$ constructed by the algorithm, we have $|X|\le \min\{r,(|V|+{\rm opt})/2\},$ where $r$ is the rank of a (modified) adjacent matrix of $G$ and ${\rm opt}$ is the size of an optimal solution to the problem.
To the best of our knowledge, this is the first polynomial-time approximation algorithm for the problem with a nontrivial approximation guarantee.

[258]  arXiv:2404.16548 [pdf, other]
Title: Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System
Comments: 12 pages including highlights and graphical abstract, submitted to Expert Systems with Applications journal
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems. Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance. Precisely, we extract 2D features from camera images using a state-of-the-art deep learning architecture and then apply a novel Cross-Domain Spatial Matching (CDSM) transformation method to convert these features into 3D space. We then fuse them with extracted radar data using a complementary fusion strategy to produce a final 3D object representation. To demonstrate the effectiveness of our approach, we evaluate it on the NuScenes dataset. We compare our approach to both single-sensor performance and current state-of-the-art fusion methods. Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.

[259]  arXiv:2404.16549 [pdf, other]
Title: Application of Long-Short Term Memory and Convolutional Neural Networks for Real-Time Bridge Scour Forecast
Subjects: Machine Learning (cs.LG)

Scour around bridge piers is a critical challenge for infrastructures around the world. In the absence of analytical models and due to the complexity of the scour process, it is difficult for current empirical methods to achieve accurate predictions. In this paper, we exploit the power of deep learning algorithms to forecast the scour depth variations around bridge piers based on historical sensor monitoring data, including riverbed elevation, flow elevation, and flow velocity. We investigated the performance of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) models for real-time scour forecasting using data collected from bridges in Alaska and Oregon from 2006 to 2021. The LSTM models achieved mean absolute error (MAE) ranging from 0.1m to 0.5m for predicting bed level variations a week in advance, showing a reasonable performance. The Fully Convolutional Network (FCN) variant of CNN outperformed other CNN configurations, showing a comparable performance to LSTMs with significantly lower computational costs. We explored various innovative random-search heuristics for hyperparameter tuning and model optimisation which resulted in reduced computational cost compared to grid-search method. The impact of different combinations of sensor features on scour prediction showed the significance of the historical time series of scour for predicting upcoming events. Overall, this study provides a greater understanding of the potential of Deep Learning (DL) for real-time scour forecasting and early warning in bridges with diverse scour and flow characteristics including riverine and tidal/coastal bridges.

[260]  arXiv:2404.16551 [pdf, other]
Title: Surprisingly Strong Performance Prediction with Neural Graph Features
Comments: 45 pages, 30 figures
Subjects: Machine Learning (cs.LG)

Performance prediction has been a key part of the neural architecture search (NAS) process, allowing to speed up NAS algorithms by avoiding resource-consuming network training. Although many performance predictors correlate well with ground truth performance, they require training data in the form of trained networks. Recently, zero-cost proxies have been proposed as an efficient method to estimate network performance without any training. However, they are still poorly understood, exhibit biases with network properties, and their performance is limited. Inspired by the drawbacks of zero-cost proxies, we propose neural graph features (GRAF), simple to compute properties of architectural graphs. GRAF offers fast and interpretable performance prediction while outperforming zero-cost proxies and other common encodings. In combination with other zero-cost proxies, GRAF outperforms most existing performance predictors at a fraction of the cost.

[261]  arXiv:2404.16552 [pdf, other]
Title: Efficient Solution of Point-Line Absolute Pose
Comments: CVPR 2024, 11 pages, 8 figures, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We revisit certain problems of pose estimation based on 3D--2D correspondences between features which may be points or lines. Specifically, we address the two previously-studied minimal problems of estimating camera extrinsics from $p \in \{ 1, 2 \}$ point--point correspondences and $l=3-p$ line--line correspondences. To the best of our knowledge, all of the previously-known practical solutions to these problems required computing the roots of degree $\ge 4$ (univariate) polynomials when $p=2$, or degree $\ge 8$ polynomials when $p=1.$ We describe and implement two elementary solutions which reduce the degrees of the needed polynomials from $4$ to $2$ and from $8$ to $4$, respectively. We show experimentally that the resulting solvers are numerically stable and fast: when compared to the previous state-of-the art, we may obtain nearly an order of magnitude speedup. The code is available at \url{https://github.com/petrhruby97/efficient\_absolute}

[262]  arXiv:2404.16553 [pdf, other]
Title: RE-RecSys: An End-to-End system for recommending properties in Real-Estate domain
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

We propose an end-to-end real-estate recommendation system, RE-RecSys, which has been productionized in real-world industry setting. We categorize any user into 4 categories based on available historical data: i) cold-start users; ii) short-term users; iii) long-term users; and iv) short-long term users. For cold-start users, we propose a novel rule-based engine that is based on the popularity of locality and user preferences. For short-term users, we propose to use content-filtering model which recommends properties based on recent interactions of users. For long-term and short-long term users, we propose a novel combination of content and collaborative filtering based approach which can be easily productionized in the real-world scenario. Moreover, based on the conversion rate, we have designed a novel weighing scheme for different impressions done by users on the platform for the training of content and collaborative models. Finally, we show the efficiency of the proposed pipeline, RE-RecSys, on a real-world property and clickstream dataset collected from leading real-estate platform in India. We show that the proposed pipeline is deployable in real-world scenario with an average latency of <40 ms serving 1000 rpm.

[263]  arXiv:2404.16554 [pdf, other]
Title: Generalized Multiscale Finite Element Method for discrete network (graph) models
Authors: Maria Vasilyeva
Subjects: Numerical Analysis (math.NA)

In this paper, we consider a time-dependent discrete network model with highly varying connectivity. The approximation by time is performed using an implicit scheme. We propose the coarse scale approximation construction of network models based on the Generalized Multiscale Finite Element Method. An accurate coarse-scale approximation is generated by solving local spectral problems in sub-networks. Convergence analysis of the proposed method is presented for semi-discrete and discrete network models. We establish the stability of the multiscale discrete network. Numerical results are presented for structured and random heterogeneous networks.

[264]  arXiv:2404.16555 [pdf, other]
Title: MMGRec: Multimodal Generative Recommendation with Transformer Model
Subjects: Information Retrieval (cs.IR)

Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information. Previous studies commonly employ an embed-and-retrieve paradigm: learning user and item representations in the same embedding space, then retrieving similar candidate items for a user via embedding inner product. However, this paradigm suffers from inference cost, interaction modeling, and false-negative issues. Toward this end, we propose a new MMGRec model to introduce a generative paradigm into multimodal recommendation. Specifically, we first devise a hierarchical quantization method Graph RQ-VAE to assign Rec-ID for each item from its multimodal and CF information. Consisting of a tuple of semantically meaningful tokens, Rec-ID serves as the unique identifier of each item. Afterward, we train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences. The generative paradigm is qualified since this model systematically predicts the tuple of tokens identifying the recommended item in an autoregressive manner. Moreover, a relation-aware self-attention mechanism is devised for the Transformer to handle non-sequential interaction sequences, which explores the element pairwise relation to replace absolute positional encoding. Extensive experiments evaluate MMGRec's effectiveness compared with state-of-the-art methods.

[265]  arXiv:2404.16556 [pdf, other]
Title: Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Few-shot image synthesis entails generating diverse and realistic images of novel categories using only a few example images. While multiple recent efforts in this direction have achieved impressive results, the existing approaches are dependent only upon the few novel samples available at test time in order to generate new images, which restricts the diversity of the generated images. To overcome this limitation, we propose Conditional Distribution Modelling (CDM) -- a framework which effectively utilizes Diffusion models for few-shot image generation. By modelling the distribution of the latent space used to condition a Diffusion process, CDM leverages the learnt statistics of the training data to get a better approximation of the unseen class distribution, thereby removing the bias arising due to limited number of few shot samples. Simultaneously, we devise a novel inversion based optimization strategy that further improves the approximated unseen class distribution, and ensures the fidelity of the generated samples to the unseen class. The experimental results on four benchmark datasets demonstrate the effectiveness of our proposed CDM for few-shot generation.

[266]  arXiv:2404.16557 [pdf, other]
Title: Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples
Comments: arXiv admin note: substantial text overlap with arXiv:2401.11170
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Despite the exceptional performance of multi-modal large language models (MLLMs), their deployment requires substantial computational resources. Once malicious users induce high energy consumption and latency time (energy-latency cost), it will exhaust computational resources and harm availability of service. In this paper, we investigate this vulnerability for MLLMs, particularly image-based and video-based ones, and aim to induce high energy-latency cost during inference by crafting an imperceptible perturbation. We find that high energy-latency cost can be manipulated by maximizing the length of generated sequences, which motivates us to propose verbose samples, including verbose images and videos. Concretely, two modality non-specific losses are proposed, including a loss to delay end-of-sequence (EOS) token and an uncertainty loss to increase the uncertainty over each generated token. In addition, improving diversity is important to encourage longer responses by increasing the complexity, which inspires the following modality specific loss. For verbose images, a token diversity loss is proposed to promote diverse hidden states. For verbose videos, a frame feature diversity loss is proposed to increase the feature diversity among frames. To balance these losses, we propose a temporal weight adjustment algorithm. Experiments demonstrate that our verbose samples can largely extend the length of generated sequences.

[267]  arXiv:2404.16558 [pdf, other]
Title: DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation
Comments: 4 pages, 3 Figures, published to IET Electronic Letters
Journal-ref: Electronics Letters (ISSN: 00135194), jaar: 2024, volume: 60, nummer: 8, startpagina: ?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

This paper presents DeepKalPose, a novel approach for enhancing temporal consistency in monocular vehicle pose estimation applied on video through a deep-learning-based Kalman Filter. By integrating a Bi-directional Kalman filter strategy utilizing forward and backward time-series processing, combined with a learnable motion model to represent complex motion patterns, our method significantly improves pose accuracy and robustness across various conditions, particularly for occluded or distant vehicles. Experimental validation on the KITTI dataset confirms that DeepKalPose outperforms existing methods in both pose accuracy and temporal consistency.

[268]  arXiv:2404.16561 [pdf, ps, other]
Title: Research on geometric figure classification algorithm based on Deep Learning
Comments: 6 pages,9 figures
Journal-ref: Scientific Journal of Intelligent Systems Research,Volume 4 Issue 6, 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, with the rapid development of computer information technology, the development of artificial intelligence has been accelerating. The traditional geometry recognition technology is relatively backward and the recognition rate is low. In the face of massive information database, the traditional algorithm model inevitably has the problems of low recognition accuracy and poor performance. Deep learning theory has gradually become a very important part of machine learning. The implementation of convolutional neural network (CNN) reduces the difficulty of graphics generation algorithm. In this paper, using the advantages of lenet-5 architecture sharing weights and feature extraction and classification, the proposed geometric pattern recognition algorithm model is faster in the training data set. By constructing the shared feature parameters of the algorithm model, the cross-entropy loss function is used in the recognition process to improve the generalization of the model and improve the average recognition accuracy of the test data set.

[269]  arXiv:2404.16562 [pdf, ps, other]
Title: L is different from NP
Subjects: Computational Complexity (cs.CC)

We prove that the class LOGSPACE (L, for short) is different from the class NP.

[270]  arXiv:2404.16563 [pdf, other]
Title: Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark
Subjects: Computation and Language (cs.CL)

Large Language Models (LLMs) offer the potential for automatic time series analysis and reporting, which is a critical task across many domains, spanning healthcare, finance, climate, energy, and many more. In this paper, we propose a framework for rigorously evaluating the capabilities of LLMs on time series understanding, encompassing both univariate and multivariate forms. We introduce a comprehensive taxonomy of time series features, a critical framework that delineates various characteristics inherent in time series data. Leveraging this taxonomy, we have systematically designed and synthesized a diverse dataset of time series, embodying the different outlined features. This dataset acts as a solid foundation for assessing the proficiency of LLMs in comprehending time series. Our experiments shed light on the strengths and limitations of state-of-the-art LLMs in time series understanding, revealing which features these models readily comprehend effectively and where they falter. In addition, we uncover the sensitivity of LLMs to factors including the formatting of the data, the position of points queried within a series and the overall time series length.

[271]  arXiv:2404.16565 [pdf, other]
Title: PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages
Comments: This paper has been accepted at FSE 2024
Subjects: Software Engineering (cs.SE)

A package's source code repository records the development history of the package, providing indispensable information for the use and risk monitoring of the package. However, a package release often misses its source code repository due to the separation of the package's development platform from its distribution platform. Existing tools retrieve the release's repository information from its metadata, which suffers from two limitations: the metadata may not contain or contain wrong information. Our analysis shows that existing tools can only retrieve repository information for up to 70.5% of PyPI releases. To address the limitations, this paper proposes PyRadar, a novel framework that utilizes the metadata and source distribution to retrieve and validate the repository information for PyPI releases. We start with an empirical study to compare four existing tools on 4,227,425 PyPI releases and analyze phantom files (files appearing in the release's distribution but not in the release's repository) in 14,375 correct package-repository links and 2,064 incorrect links. Based on the findings, we design PyRadar with three components, i.e., Metadata-based Retriever, Source Code Repository Validator, and Source Code-based Retriever. In particular, the Metadata-based Retriever combines best practices of existing tools and successfully retrieves repository information from the metadata for 72.1% of PyPI releases. The Source Code Repository Validator applies common machine learning algorithms on six crafted features and achieves an AUC of up to 0.995. The Source Code-based Retriever queries World of Code with the SHA-1 hashes of all Python files in the release's source distribution and retrieves repository information for 90.2% of packages in our dataset with an accuracy of 0.970. Both practitioners and researchers can employ the PyRadar to better use PyPI packages.

[272]  arXiv:2404.16571 [pdf, other]
Title: MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images
Comments: 9 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by reshaping the photometric constraint into a cycle form. Instead of only warping the source image, MonoPCC constructs a closed loop consisting of two opposite forward-backward warping paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image's phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward warping. The comprehensive and extensive experimental results on three datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by at least 7.27%.

[273]  arXiv:2404.16572 [pdf, ps, other]
Title: ReliK: A Reliability Measure for Knowledge Graph Embeddings
Subjects: Social and Information Networks (cs.SI)

Can we assess a priori how well a knowledge graph embedding will perform on a specific downstream task and in a specific part of the knowledge graph? Knowledge graph embeddings (KGEs) represent entities (e.g., "da Vinci," "Mona Lisa") and relationships (e.g., "painted") of a knowledge graph (KG) as vectors. KGEs are generated by optimizing an embedding score, which assesses whether a triple (e.g., "da Vinci," "painted," "Mona Lisa") exists in the graph. KGEs have been proven effective in a variety of web-related downstream tasks, including, for instance, predicting relationships among entities. However, the problem of anticipating the performance of a given KGE in a certain downstream task and locally to a specific individual triple, has not been tackled so far.
In this paper, we fill this gap with ReliK, a Reliability measure for KGEs. ReliK relies solely on KGE embedding scores, is task- and KGE-agnostic, and requires no further KGE training. As such, it is particularly appealing for semantic web applications which call for testing multiple KGE methods on various parts of the KG and on each individual downstream task. Through extensive experiments, we attest that ReliK correlates well with both common downstream tasks, such as tail or relation prediction and triple classification, as well as advanced downstream tasks, such as rule mining and question answering, while preserving locality.

[274]  arXiv:2404.16573 [pdf, other]
Title: Multi-Scale Representations by Varying Window Attention for Semantic Segmentation
Comments: ICLR2024 Poster
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-scale learning is central to semantic segmentation. We visualize the effective receptive field (ERF) of canonical multi-scale representations and point out two risks in learning them: scale inadequacy and field inactivation. A novel multi-scale learner, varying window attention (VWA), is presented to address these issues. VWA leverages the local window attention (LWA) and disentangles LWA into the query window and context window, allowing the context's scale to vary for the query to learn representations at multiple scales. However, varying the context to large-scale windows (enlarging ratio R) can significantly increase the memory footprint and computation cost (R^2 times larger than LWA). We propose a simple but professional re-scaling strategy to zero the extra induced cost without compromising performance. Consequently, VWA uses the same cost as LWA to overcome the receptive limitation of the local window. Furthermore, depending on VWA and employing various MLPs, we introduce a multi-scale decoder (MSD), VWFormer, to improve multi-scale representations for semantic segmentation. VWFormer achieves efficiency competitive with the most compute-friendly MSDs, like FPN and MLP decoder, but performs much better than any MSDs. For instance, using nearly half of UPerNet's computation, VWFormer outperforms it by 1.0%-2.5% mIoU on ADE20K. With little extra overhead, ~10G FLOPs, Mask2Former armed with VWFormer improves by 1.0%-1.3%.

[275]  arXiv:2404.16574 [pdf, other]
Title: Exploring Internal Numeracy in Language Models: A Case Study on ALBERT
Comments: 4 pages + references, 4 figures. Accepted for publication at the MathNLP Workshop at LREC-COLING 2024
Subjects: Computation and Language (cs.CL)

It has been found that Transformer-based language models have the ability to perform basic quantitative reasoning. In this paper, we propose a method for studying how these models internally represent numerical data, and use our proposal to analyze the ALBERT family of language models. Specifically, we extract the learned embeddings these models use to represent tokens that correspond to numbers and ordinals, and subject these embeddings to Principal Component Analysis (PCA). PCA results reveal that ALBERT models of different sizes, trained and initialized separately, consistently learn to use the axes of greatest variation to represent the approximate ordering of various numerical concepts. Numerals and their textual counterparts are represented in separate clusters, but increase along the same direction in 2D space. Our findings illustrate that language models, trained purely to model text, can intuit basic mathematical concepts, opening avenues for NLP applications that intersect with quantitative reasoning.

[276]  arXiv:2404.16576 [pdf, other]
Title: Implicit-Explicit schemes for decoupling multicontinuum problems in porous media
Authors: Maria Vasilyeva
Subjects: Numerical Analysis (math.NA)

In this work, we present an efficient way to decouple the multicontinuum problems. To construct decoupled schemes, we propose Implicit-Explicit time approximation in general form and study them for the fine-scale and coarse-scale space approximations. We use a finite-volume method for fine-scale approximation, and the nonlocal multicontinuum (NLMC) method is used to construct an accurate and physically meaningful coarse-scale approximation. The NLMC method is an accurate technique to develop a physically meaningful coarse scale model based on defining the macroscale variables. The multiscale basis functions are constructed in local domains by solving constraint energy minimization problems and projecting the system to the coarse grid. The resulting basis functions have exponential decay properties and lead to the accurate approximation on a coarse grid. We construct a fully Implicit time approximation for semi-discrete systems arising after fine-scale and coarse-scale space approximations. We investigate the stability of the two and three-level schemes for fully Implicit and Implicit-Explicit time approximations schemes for multicontinuum problems in fractured porous media. We show that combining the decoupling technique with multiscale approximation leads to developing an accurate and efficient solver for multicontinuum problems.

[277]  arXiv:2404.16577 [pdf, other]
Title: Stokes-Brinkman-Darcy models for fluid-porous systems: derivation, analysis and validation
Comments: 20 pages, 5 figures
Subjects: Numerical Analysis (math.NA)

Flow interaction between a plain-fluid region in contact with a porous layer attracted significant attention from modelling and analysis sides due to numerous applications in biology, environment and industry. In the most widely used coupled model, fluid flow is described by the Stokes equations in the free-flow domain and Darcy's law in the porous medium, and complemented by the appropriate interface conditions. However, traditional coupling concepts are restricted, with a few exceptions, to one-dimensional flows parallel to the fluid-porous interface. In this work, we use an alternative approach to model interaction between the plain-fluid domain and porous medium by considering a transition zone, and propose the full- and hybrid-dimensional Stokes-Brinkman-Darcy models. In the first case, the equi-dimensional Brinkman equations are considered in the transition region, and the appropriate interface conditions are set on the top and bottom of the transition zone. In the latter case, we perform a dimensional model reduction by averaging the Brinkman equations in the normal direction and using the proposed transmission conditions. The well-posedness of both coupled problems is proved, and some numerical simulations are carried out in order to validate the concepts.

[278]  arXiv:2404.16578 [pdf, other]
Title: Road Surface Friction Estimation for Winter Conditions Utilising General Visual Features
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In below freezing winter conditions, road surface friction can greatly vary based on the mixture of snow, ice, and water on the road. Friction between the road and vehicle tyres is a critical parameter defining vehicle dynamics, and therefore road surface friction information is essential to acquire for several intelligent transportation applications, such as safe control of automated vehicles or alerting drivers of slippery road conditions. This paper explores computer vision-based evaluation of road surface friction from roadside cameras. Previous studies have extensively investigated the application of convolutional neural networks for the task of evaluating the road surface condition from images. Here, we propose a hybrid deep learning architecture, WCamNet, consisting of a pretrained visual transformer model and convolutional blocks. The motivation of the architecture is to combine general visual features provided by the transformer model, as well as finetuned feature extraction properties of the convolutional blocks. To benchmark the approach, an extensive dataset was gathered from national Finnish road infrastructure network of roadside cameras and optical road surface friction sensors. Acquired results highlight that the proposed WCamNet outperforms previous approaches in the task of predicting the road surface friction from the roadside camera images.

[279]  arXiv:2404.16579 [pdf, other]
Title: Neural Interaction Energy for Multi-Agent Trajectory Prediction
Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

Maintaining temporal stability is crucial in multi-agent trajectory prediction. Insufficient regularization to uphold this stability often results in fluctuations in kinematic states, leading to inconsistent predictions and the amplification of errors. In this study, we introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE). This framework assesses the interactive motion of agents by employing neural interaction energy, which captures the dynamics of interactions and illustrates their influence on the future trajectories of agents. To bolster temporal stability, we introduce two constraints: inter-agent interaction constraint and intra-agent motion constraint. These constraints work together to ensure temporal stability at both the system and agent levels, effectively mitigating prediction fluctuations inherent in multi-agent systems. Comparative evaluations against previous methods on four diverse datasets highlight the superior prediction accuracy and generalization capabilities of our model.

[280]  arXiv:2404.16581 [pdf, other]
Title: AudioScenic: Audio-Driven Video Scene Editing
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Audio-driven visual scene editing endeavors to manipulate the visual background while leaving the foreground content unchanged, according to the given audio signals. Unlike current efforts focusing primarily on image editing, audio-driven video scene editing has not been extensively addressed. In this paper, we introduce AudioScenic, an audio-driven framework designed for video scene editing. AudioScenic integrates audio semantics into the visual scene through a temporal-aware audio semantic injection process. As our focus is on background editing, we further introduce a SceneMasker module, which maintains the integrity of the foreground content during the editing process. AudioScenic exploits the inherent properties of audio, namely, audio magnitude and frequency, to guide the editing process, aiming to control the temporal dynamics and enhance the temporal consistency. First, we present an audio Magnitude Modulator module that adjusts the temporal dynamics of the scene in response to changes in audio magnitude, enhancing the visual dynamics. Second, the audio Frequency Fuser module is designed to ensure temporal consistency by aligning the frequency of the audio with the dynamics of the video scenes, thus improving the overall temporal coherence of the edited videos. These integrated features enable AudioScenic to not only enhance visual diversity but also maintain temporal consistency throughout the video. We present a new metric named temporal score for more comprehensive validation of temporal consistency. We demonstrate substantial advancements of AudioScenic over competing methods on DAVIS and Audioset datasets.

[281]  arXiv:2404.16583 [pdf, other]
Title: Fast Machine-Precision Spectral Likelihoods for Stationary Time Series
Subjects: Numerical Analysis (math.NA); Computation (stat.CO); Methodology (stat.ME)

We provide in this work an algorithm for approximating a very broad class of symmetric Toeplitz matrices to machine precision in $\mathcal{O}(n \log n)$ time. In particular, for a Toeplitz matrix $\mathbf{\Sigma}$ with values $\mathbf{\Sigma}_{j,k} = h_{|j-k|} = \int_{-1/2}^{1/2} e^{2 \pi i |j-k| \omega} S(\omega) \mathrm{d} \omega$ where $S(\omega)$ is piecewise smooth, we give an approximation $\mathbf{\mathcal{F}} \mathbf{\Sigma} \mathbf{\mathcal{F}}^H \approx \mathbf{D} + \mathbf{U} \mathbf{V}^H$, where $\mathbf{\mathcal{F}}$ is the DFT matrix, $\mathbf{D}$ is diagonal, and the matrices $\mathbf{U}$ and $\mathbf{V}$ are in $\mathbb{C}^{n \times r}$ with $r \ll n$. Studying these matrices in the context of time series, we offer a theoretical explanation of this structure and connect it to existing spectral-domain approximation frameworks. We then give a complete discussion of the numerical method for assembling the approximation and demonstrate its efficiency for improving Whittle-type likelihood approximations, including dramatic examples where a correction of rank $r = 2$ to the standard Whittle approximation increases the accuracy from $3$ to $14$ digits for a matrix $\mathbf{\Sigma} \in \mathbb{R}^{10^5 \times 10^5}$. The method and analysis of this work applies well beyond time series analysis, providing an algorithm for extremely accurate direct solves with a wide variety of symmetric Toeplitz matrices. The analysis employed here largely depends on asymptotic expansions of oscillatory integrals, and also provides a new perspective on when existing spectral-domain approximation methods for Gaussian log-likelihoods can be particularly problematic.

[282]  arXiv:2404.16584 [pdf, other]
Title: Numerical integrators for confined Langevin dynamics
Subjects: Numerical Analysis (math.NA); Probability (math.PR)

We derive and analyze numerical methods for weak approximation of underdamped (kinetic) Langevin dynamics in bounded domains. First-order methods are based on an Euler-type scheme interlaced with collisions with the boundary. To achieve second order, composition schemes are derived based on decomposition of the generator into collisional drift, impulse, and stochastic momentum evolution. In a deterministic setting, this approach would typically lead to first-order approximation, even in symmetric compositions, but we find that the stochastic method can provide second-order weak approximation with a single gradient evaluation, both at finite times and in the ergodic limit. We provide theoretical and numerical justification for this observation using model problems and compare and contrast the numerical performance of different choices of the ordering of the terms in the splitting scheme.

[283]  arXiv:2404.16587 [pdf, other]
Title: Understanding Privacy Risks of Embeddings Induced by Large Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation. However, such a solution risks compromising privacy, as recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models. The significant advantage of LLMs over traditional pre-trained models may exacerbate these concerns. To this end, we investigate the effectiveness of reconstructing original knowledge and predicting entity attributes from these embeddings when LLMs are employed. Empirical findings indicate that LLMs significantly improve the accuracy of two evaluated tasks over those from pre-trained models, regardless of whether the texts are in-distribution or out-of-distribution. This underscores a heightened potential for LLMs to jeopardize user privacy, highlighting the negative consequences of their widespread use. We further discuss preliminary strategies to mitigate this risk.

[284]  arXiv:2404.16588 [pdf, ps, other]
Title: Proving Behavioural Apartness
Subjects: Logic in Computer Science (cs.LO)

Bisimilarity is a central notion for coalgebras. In recent work, Geuvers and Jacobs suggest to focus on apartness, which they define by dualising coalgebraic bisimulations. This yields the possibility of finite proofs of distinguishability for a wide variety of state-based systems.
We propose behavioural apartness, defined by dualising behavioural equivalence rather than bisimulations. A motivating example is the subdistribution functor, where the proof system based on bisimilarity requires an infinite quantification over couplings, whereas behavioural apartness instantiates to a finite rule. In addition, we provide optimised proof rules for behavioural apartness and show their use in several examples.

[285]  arXiv:2404.16592 [pdf, ps, other]
Title: Uninterrupted Maximum Flow on Signalized Traffic Networks
Comments: 31 pages, 16 figures, 3 tables
Subjects: Systems and Control (eess.SY)

This paper describes a traffic signal control procedure that allows motorists who travel at a recommended speed on suburban arterial two-way roads with a common cycle-time to make every traffic signal. A road-to-traveler-feedback-device advises motorists how fast they should travel to do this. Signalized arterial roads where vehicles that travel at the recommended speed make every traffic signal are termed Ride-the-Green-Wave (RGW) roads. Left-turn-arounds allow vehicles to turn left from one two-way RGW-road to an intersecting/orthogonal two-way RGW-road while allowing maximum flow on the intersecting RGW-roads. In addition to introducing novel traffic signal control strategies, the methods presented in this paper have implications for: road network design, public transport control, connected and automated vehicles and environmental impacts.

[286]  arXiv:2404.16609 [pdf, other]
Title: SFMViT: SlowFast Meet ViT in Chaotic World
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The task of spatiotemporal action localization in chaotic scenes is a challenging task toward advanced video understanding. Paving the way with high-quality video feature extraction and enhancing the precision of detector-predicted anchors can effectively improve model performance. To this end, we propose a high-performance dual-stream spatiotemporal feature extraction network SFMViT with an anchor pruning strategy. The backbone of our SFMViT is composed of ViT and SlowFast with prior knowledge of spatiotemporal action localization, which fully utilizes ViT's excellent global feature extraction capabilities and SlowFast's spatiotemporal sequence modeling capabilities. Secondly, we introduce the confidence maximum heap to prune the anchors detected in each frame of the picture to filter out the effective anchors. These designs enable our SFMViT to achieve a mAP of 26.62% in the Chaotic World dataset, far exceeding existing models. Code is available at https://github.com/jfightyr/SlowFast-Meet-ViT.

[287]  arXiv:2404.16611 [pdf, ps, other]
Title: Towards Symbiotic SAGIN Through Inter-operator Resource and Service Sharing: Joint Orchestration of User Association and Radio Resources
Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)

The space-air-ground integrated network (SAGIN) is a pivotal architecture to support ubiquitous connectivity in the upcoming 6G era. Inter-operator resource and service sharing is a promising way to realize such a huge network, utilizing resources efficiently and reducing construction costs. Given the rationality of operators, the configuration of resources and services in SAGIN should focus on both the overall system performance and individual benefits of operators. Motivated by emerging symbiotic communication facilitating mutual benefits across different radio systems, we investigate the resource and service sharing in SAGIN from a symbiotic communication perspective in this paper. In particular, we consider a SAGIN consisting of a ground network operator (GNO) and a satellite network operator (SNO). Specifically, we aim to maximize the weighted sum rate (WSR) of the whole SAGIN by jointly optimizing the user association, resource allocation, and beamforming. Besides, we introduce a sharing coefficient to characterize the revenue of operators. Operators may suffer revenue loss when only focusing on maximizing the WSR. In pursuit of mutual benefits, we propose a mutual benefit constraint (MBC) to ensure that each operator obtains revenue gains. Then, we develop a centralized algorithm based on the successive convex approximation (SCA) method. Considering that the centralized algorithm is difficult to implement, we propose a distributed algorithm based on Lagrangian dual decomposition and the consensus alternating direction method of multipliers (ADMM). Finally, we provide extensive numerical simulations to demonstrate the effectiveness of the two proposed algorithms, and the distributed optimization algorithm can approach the performance of the centralized one.

[288]  arXiv:2404.16612 [pdf, other]
Title: MuseumMaker: Continual Style Customization without Catastrophic Forgetting
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Pre-trained large text-to-image (T2I) models with an appropriate text prompt has attracted growing interests in customized images generation field. However, catastrophic forgetting issue make it hard to continually synthesize new user-provided styles while retaining the satisfying results amongst learned styles. In this paper, we propose MuseumMaker, a method that enables the synthesis of images by following a set of customized styles in a never-end manner, and gradually accumulate these creative artistic works as a Museum. When facing with a new customization style, we develop a style distillation loss module to transfer the style of the whole dataset into generation of images. It can minimize the learning biases caused by content of images, and address the catastrophic overfitting issue induced by few-shot images. To deal with catastrophic forgetting amongst past learned styles, we devise a dual regularization for shared-LoRA module to optimize the direction of model update, which could regularize the diffusion model from both weight and feature aspects, respectively. Meanwhile, a unique token embedding corresponding to this new style is learned by a task-wise token learning module, which could preserve historical knowledge from past styles with the limitation of LoRA parameter quantity. As any new user-provided style come, our MuseumMaker can capture the nuances of the new styles while maintaining the details of learned styles. Experimental results on diverse style datasets validate the effectiveness of our proposed MuseumMaker method, showcasing its robustness and versatility across various scenarios.

[289]  arXiv:2404.16614 [pdf, ps, other]
Title: Derandomization with Pseudorandomness
Authors: Emin Karayel
Subjects: Logic in Computer Science (cs.LO)

Derandomization techniques are often used within advanced randomized algorithms. In particular, pseudorandom objects, such as hash families and expander graphs, are key components of such algorithms, but their verification presents a challenge. This work shows how such algorithms can be expressed and verified in Isabelle and presents a pseudorandom objects library that abstracts away the involved deep algebraic/analytic results. Moreover, it presents examples that show how the library eases and enables the verification of advanced randomized algorithms. Highlighting the value of this framework is that it was recently used to verify the optimal-space distinct elements algorithm by Blasiok from 2018, which relies on the combination of many derandomization techniques to achieve its optimality.

[290]  arXiv:2404.16616 [pdf, other]
Title: Robust Capped lp-Norm Support Vector Ordinal Regression
Subjects: Machine Learning (cs.LG)

Ordinal regression is a specialized supervised problem where the labels show an inherent order. The order distinguishes it from normal multi-class problem. Support Vector Ordinal Regression, as an outstanding ordinal regression model, is widely used in many ordinal regression tasks. However, like most supervised learning algorithms, the design of SVOR is based on the assumption that the training data are real and reliable, which is difficult to satisfy in real-world data. In many practical applications, outliers are frequently present in the training set, potentially leading to misguide the learning process, such that the performance is non-optimal. In this paper, we propose a novel capped $\ell_{p}$-norm loss function that is theoretically robust to both light and heavy outliers. The capped $\ell_{p}$-norm loss can help the model detect and eliminate outliers during training process. Adhering to this concept, we introduce a new model, Capped $\ell_{p}$-Norm Support Vector Ordinal Regression(CSVOR), that is robust to outliers. CSVOR uses a weight matrix to detect and eliminate outliers during the training process to improve the robustness to outliers. Moreover, a Re-Weighted algorithm algorithm which is illustrated convergence by our theoretical results is proposed to effectively minimize the corresponding problem. Extensive experimental results demonstrate that our model outperforms state-of-the-art(SOTA) methods, particularly in the presence of outliers.

[291]  arXiv:2404.16617 [pdf, other]
Title: Denoising: from classical methods to deep CNNs
Comments: 33 pages, 33 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); History and Overview (math.HO)

This paper aims to explore the evolution of image denoising in a pedagological way. We briefly review classical methods such as Fourier analysis and wavelet bases, highlighting the challenges they faced until the emergence of neural networks, notably the U-Net, in the 2010s. The remarkable performance of these networks has been demonstrated in studies such as Kadkhodaie et al. (2024). They exhibit adaptability to various image types, including those with fixed regularity, facial images, and bedroom scenes, achieving optimal results and biased towards geometry-adaptive harmonic basis. The introduction of score diffusion has played a crucial role in image generation. In this context, denoising becomes essential as it facilitates the estimation of probability density scores. We discuss the prerequisites for genuine learning of probability densities, offering insights that extend from mathematical research to the implications of universal structures.

[292]  arXiv:2404.16619 [pdf, other]
Title: The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge
Comments: Accepted in Grand Challenge of ICASSP 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the system upon YourTTS and add several enhancements. For further improving speaker similarity and speech quality, we introduce speaker-aware text encoder and flow-based decoder with Transformer blocks. In addition, we denoise the few-shot data, mix up them with pre-training data, and adopt a speaker-balanced sampling strategy to guarantee effective fine-tuning for target speakers. The official evaluations in track 1 show that our system achieves the best speaker similarity MOS of 4.25 and obtains considerable naturalness MOS of 3.97.

[293]  arXiv:2404.16621 [pdf, other]
Title: Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The integration of Large Language Models (LLMs) into healthcare promises to transform medical diagnostics, research, and patient care. Yet, the progression of medical LLMs faces obstacles such as complex training requirements, rigorous evaluation demands, and the dominance of proprietary models that restrict academic exploration. Transparent, comprehensive access to LLM resources is essential for advancing the field, fostering reproducibility, and encouraging innovation in healthcare AI. We present Hippocrates, an open-source LLM framework specifically developed for the medical domain. In stark contrast to previous efforts, it offers unrestricted access to its training datasets, codebase, checkpoints, and evaluation protocols. This open approach is designed to stimulate collaborative research, allowing the community to build upon, refine, and rigorously evaluate medical LLMs within a transparent ecosystem. Also, we introduce Hippo, a family of 7B models tailored for the medical domain, fine-tuned from Mistral and LLaMA2 through continual pre-training, instruction tuning, and reinforcement learning from human and AI feedback. Our models outperform existing open medical LLMs models by a large-margin, even surpassing models with 70B parameters. Through Hippocrates, we aspire to unlock the full potential of LLMs not just to advance medical knowledge and patient care but also to democratize the benefits of AI research in healthcare, making them available across the globe.

[294]  arXiv:2404.16622 [pdf, other]
Title: DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting
Comments: Accepted to CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting.

[295]  arXiv:2404.16623 [pdf, other]
Title: Layered List Labeling
Comments: PODS 2024, 19 pages, 4 figures
Subjects: Data Structures and Algorithms (cs.DS)

The list-labeling problem is one of the most basic and well-studied algorithmic primitives in data structures, with an extensive literature spanning upper bounds, lower bounds, and data management applications. The classical algorithm for this problem, dating back to 1981, has amortized cost $O(\log^2 n)$. Subsequent work has led to improvements in three directions: \emph{low-latency} (worst-case) bounds; \emph{high-throughput} (expected) bounds; and (adaptive) bounds for \emph{important workloads}.
Perhaps surprisingly, these three directions of research have remained almost entirely disjoint -- this is because, so far, the techniques that allow for progress in one direction have forced worsening bounds in the others. Thus there would appear to be a tension between worst-case, adaptive, and expected bounds. List labeling has been proposed for use in databases at least as early as PODS'99, but a database needs good throughput, response time, and needs to adapt to common workloads (e.g., bulk loads), and no current list-labeling algorithm achieve good bounds for all three.
We show that this tension is not fundamental. In fact, with the help of new data-structural techniques, one can actually \emph{combine} any three list-labeling solutions in order to cherry-pick the best worst-case, adaptive, and expected bounds from each of them.

[296]  arXiv:2404.16624 [pdf, ps, other]
Title: Development of parallel programs on shared data-structures -- Revised version
Authors: Ketil Stølen
Subjects: Formal Languages and Automata Theory (cs.FL)

A syntax-directed formal system for the development of totally correct programs with respect to an unfair shared-state parallel while-language is proposed. The system can be understood as a compositional reformulation of the Owicki/Gries method for verification of parallel programs. Auxiliary variables are used both as a specification tool to eliminate undesirable implementations, and as a verification tool to make it possible to prove that an already finished program satisfies a particular specification. Auxiliary variables may be of any sort, and it is up to the user to define the auxiliary structure he prefers. Moreover, the auxiliary structure is only a part of the logic. This means that auxiliary variables do not have to be implemented as if they were ordinary programming variables. The system is proved sound and relatively complete with respect to an operational semantics and employed to develop three nontrivial algorithms: the Dining-Philosophers, the Bubble-Lattice-Sort and the Set-Partition algorithms. Finally, a related method for the development of (possibly nonterminating) programs with respect to four properties is described. This approach is then used to develop Dekker's algorithm.

[297]  arXiv:2404.16627 [pdf, other]
Title: Incorporating Lexical and Syntactic Knowledge for Unsupervised Cross-Lingual Transfer
Comments: Accepted at LREC-Coling 2024
Subjects: Computation and Language (cs.CL)

Unsupervised cross-lingual transfer involves transferring knowledge between languages without explicit supervision. Although numerous studies have been conducted to improve performance in such tasks by focusing on cross-lingual knowledge, particularly lexical and syntactic knowledge, current approaches are limited as they only incorporate syntactic or lexical information. Since each type of information offers unique advantages and no previous attempts have combined both, we attempt to explore the potential of this approach. In this paper, we present a novel framework called "Lexicon-Syntax Enhanced Multilingual BERT" that combines both lexical and syntactic knowledge. Specifically, we use Multilingual BERT (mBERT) as the base model and employ two techniques to enhance its learning capabilities. The code-switching technique is used to implicitly teach the model lexical alignment information, while a syntactic-based graph attention network is designed to help the model encode syntactic structure. To integrate both types of knowledge, we input code-switched sequences into both the syntactic module and the mBERT base model simultaneously. Our extensive experimental results demonstrate this framework can consistently outperform all baselines of zero-shot cross-lingual transfer, with the gains of 1.0~3.7 points on text classification, named entity recognition (ner), and semantic parsing tasks. Keywords:cross-lingual transfer, lexicon, syntax, code-switching, graph attention network

[298]  arXiv:2404.16629 [pdf, other]
Title: Implementing and Optimizing the Scaled Dot-Product Attention on Streaming Dataflow
Comments: 4 pages, 3 figures
Subjects: Hardware Architecture (cs.AR)

Transformer models serve as the backbone of many state-ofthe-art language models, and most use the scaled dot-product attention (SDPA) mechanism to capture relationships between tokens. However, the straightforward implementation of SDPA has quadratic compute and memory complexity with respect to the sequence length. On processor architectures such as GPUs and TPUs, there is a robust body of prior work. However, little work has been performed on non-processor architectures.In this work, we show how the architecture and execution model of Streaming Dataflow Accelerators can help tackle this challenge. We first define abstract hardware that adopts a streaming execution model, and we implement a cycle-accurate simulator of the abstract hardware using the Dataflow Abstract Machine simulation framework. Second, we implement the naive SDPA algorithm on this abstract hardware and show it requires linear (O(N)) intermediate memory. Third, we then modify the naive algorithm, taking inspiration from prior processor-oriented works, by reordering the multiplication and division operations. Finally, we map the modified algorithm to abstract hardware, and confirm that the implementation computes SDPA at full throughput while only using a constant amount (O(1)) of intermediate memory.

[299]  arXiv:2404.16630 [pdf, other]
Title: Legal Aspects for Software Developers Interested in Generative AI Applications
Comments: Submission under review
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Recent successes in Generative Artificial Intelligence (GenAI) have led to new technologies capable of generating high-quality code, natural language, and images. The next step is to integrate GenAI technology into products, a task typically conducted by software developers. Such product development always comes with a certain risk of liability. Within this article, we want to shed light on the current state of two such risks: data protection and copyright. Both aspects are crucial for GenAI. This technology deals with data for both model training and generated output. We summarize key aspects regarding our current knowledge that every software developer involved in product development using GenAI should be aware of to avoid critical mistakes that may expose them to liability claims.

[300]  arXiv:2404.16632 [pdf, ps, other]
Title: Introducing Systems Thinking as a Framework for Teaching and Assessing Threat Modeling Competency
Comments: Presented at the Annual Conference of the American Society for Engineering Education (ASEE'24) 2024
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Computing systems face diverse and substantial cybersecurity threats. To mitigate these cybersecurity threats, software engineers need to be competent in the skill of threat modeling. In industry and academia, there are many frameworks for teaching threat modeling, but our analysis of these frameworks suggests that (1) these approaches tend to be focused on component-level analysis rather than educating students to reason holistically about a system's cybersecurity, and (2) there is no rubric for assessing a student's threat modeling competency. To address these concerns, we propose using systems thinking in conjunction with popular and industry-standard threat modeling frameworks like STRIDE for teaching and assessing threat modeling competency. Prior studies suggest a holistic approach, like systems thinking, can help understand and mitigate cybersecurity threats. Thus, we developed and piloted two novel rubrics - one for assessing STRIDE threat modeling performance and the other for assessing systems thinking performance while conducting STRIDE.
To conduct this study, we piloted the two rubrics mentioned above to assess threat model artifacts of students enrolled in an upper-level software engineering course at Purdue University in Fall 2021, Spring 2023, and Fall 2023. Students who had both systems thinking and STRIDE instruction identified and attempted to mitigate component-level as well as systems-level threats. Students with only STRIDE instruction tended to focus on identifying and mitigating component-level threats and discounted system-level threats. We contribute to engineering education by: (1) describing a new rubric for assessing threat modeling based on systems thinking; (2) identifying trends and blindspots in students' threat modeling approach; and (3) envisioning the benefits of integrating systems thinking in threat modeling teaching and assessment.

[301]  arXiv:2404.16633 [pdf, other]
Title: Self-Balanced R-CNN for Instance Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at https://github.com/IMPLabUniPr/mmdetection/tree/sbr_cnn

[302]  arXiv:2404.16635 [pdf, other]
Title: TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
Comments: 13 pages, 11 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in terms of parameters and computational requirements limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, which trains the model to generate Python programs for numerical calculations, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module, which gradually merges most similar vision tokens. Extensive experiments demonstrate that our 3B TinyChart achieves SOTA performance on a variety of chart understanding benchmarks including ChartQA, Chart-to-Text, Chart-to-Table, OpenCQA, and ChartX. It outperforms several chart understanding MLLM with up to 13B parameters such as ChartLlama and ChartAst, and close-sourced general-purpose MLLM GPT-4V on ChartQA. It also demonstrates its superior efficiency with higher throughput during inference due to a smaller model scale and more efficient vision encoding. Our code and model are available at https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/TinyChart.

[303]  arXiv:2404.16637 [pdf, other]
Title: Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Multi-modal foundation models such as CLIP have showcased impressive zero-shot capabilities. However, their applicability in resource-constrained environments is limited due to their large number of parameters and high inference time. While existing approaches have scaled down the entire CLIP architecture, we focus on training smaller variants of the image encoder, which suffices for efficient zero-shot classification. The use of synthetic data has shown promise in distilling representations from larger teachers, resulting in strong few-shot and linear probe performance. However, we find that this approach surprisingly fails in true zero-shot settings when using contrastive losses. We identify the exploitation of spurious features as being responsible for poor generalization between synthetic and real data. However, by using the image feature-based L2 distillation loss, we mitigate these problems and train students that achieve zero-shot performance which on four domain-specific datasets is on-par with a ViT-B/32 teacher model trained on DataCompXL, while featuring up to 92% fewer parameters.

[304]  arXiv:2404.16638 [pdf, other]
Title: Privacy-Preserving Statistical Data Generation: Application to Sepsis Detection
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

The biomedical field is among the sectors most impacted by the increasing regulation of Artificial Intelligence (AI) and data protection legislation, given the sensitivity of patient information. However, the rise of synthetic data generation methods offers a promising opportunity for data-driven technologies. In this study, we propose a statistical approach for synthetic data generation applicable in classification problems. We assess the utility and privacy implications of synthetic data generated by Kernel Density Estimator and K-Nearest Neighbors sampling (KDE-KNN) within a real-world context, specifically focusing on its application in sepsis detection. The detection of sepsis is a critical challenge in clinical practice due to its rapid progression and potentially life-threatening consequences. Moreover, we emphasize the benefits of KDE-KNN compared to current synthetic data generation methodologies. Additionally, our study examines the effects of incorporating synthetic data into model training procedures. This investigation provides valuable insights into the effectiveness of synthetic data generation techniques in mitigating regulatory constraints within the biomedical field.

[305]  arXiv:2404.16644 [pdf, other]
Title: Explanations in Everyday Software Systems: Towards a Taxonomy for Explainability Needs
Comments: Preprint for research paper accepted at the 32nd IEEE International Requirements Engineering 2024 conference
Subjects: Software Engineering (cs.SE)

Modern software systems are becoming increasingly complex and opaque. The integration of explanations within software has shown the potential to address this opacity and can make the system more understandable to end-users. As a result, explainability has gained much traction as a non-functional requirement of complex systems. Understanding what type of system requires what types of explanations is necessary to facilitate the inclusion of explainability in early software design processes. In order to specify explainability requirements, an explainability taxonomy that applies to a variety of different software types is needed. In this paper, we present the results of an online survey with 84 participants. We asked the participants to state their questions and confusions concerning their three most recently used software systems and elicited both explicit and implicit explainability needs from their statements. These needs were coded by three researchers. In total, we identified and classified 315 explainability needs from the survey answers. Drawing from a large pool of explainability needs and our coding procedure, we present two major contributions of this work: 1) a taxonomy for explainability needs in everyday software systems and 2) an overview of how the need for explanations differs between different types of software systems.

[306]  arXiv:2404.16645 [pdf, other]
Title: Tele-FLM Technical Report
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.

[307]  arXiv:2404.16646 [pdf, other]
Title: Improving TAS Adaptability with a Variable Temperature Threshold
Subjects: Systems and Control (eess.SY)

Thermal-Aware Scheduling (TAS) provides methods to manage the thermal dissipation of a computing chip during task execution. These methods aim to avoid issues such as accelerated aging of the device, premature failure and degraded chip performance. In this work, we implement a new TAS algorithm, VTF-TAS, which makes use of a variable temperature threshold to control task execution and thermal dissipation. To enable adequate execution of the tasks to reach their deadlines, this threshold is managed based on the theory of fluid scheduling. Using an evaluation methodology as described in POD-TAS, we evaluate VTF-TAS using a set of 4 benchmarks from the COMBS benchmark suite to examine its ability to minimize chip temperature throughout schedule execution. Through our evaluation, we demonstrate that this new algorithm is able to adaptively manage the temperature threshold such that the peak temperature during schedule execution is lower than POD-TAS, with no requirement for an expensive search procedure to obtain an optimal threshold for scheduling.

[308]  arXiv:2404.16648 [pdf, other]
Title: Comparison of adaptive mesh refinement techniques for numerical weather prediction
Comments: 30 pages
Subjects: Numerical Analysis (math.NA)

This paper examines the application of adaptive mesh refinement (AMR) in the field of numerical weather prediction (NWP). We implement and assess two distinct AMR approaches and evaluate their performance through standard NWP benchmarks. In both cases, we solve the fully compressible Euler equations, fundamental to many non-hydrostatic weather models.
The first approach utilizes oct-tree cell-based mesh refinement coupled with a high-order discontinuous Galerkin method for spatial discretization. In the second approach, we employ level-based AMR with the finite difference method. Our study provides insights into the accuracy and benefits of employing these AMR methodologies for the multi-scale problem of NWP. Additionally, we explore essential properties including their impact on mass and energy conservation. Moreover, we present and evaluate an AMR solution transfer strategy for the tree-based AMR approach that is simple to implement, memory-efficient, and ensures conservation for both flow in the box and sphere.
Furthermore, we discuss scalability, performance portability, and the practical utility of the AMR methodology within an NWP framework -- crucial considerations in selecting an AMR approach. The current de facto standard for mesh refinement in NWP employs a relatively simplistic approach of static nested grids, either within a general circulation model or a separately operated regional model with loose one-way synchronization. It is our hope that this study will stimulate further interest in the adoption of AMR frameworks like AMReX in NWP. These frameworks offer a triple advantage: a robust dynamic AMR for tracking localized and consequential features such as tropical cyclones, extreme scalability, and performance portability.

[309]  arXiv:2404.16650 [pdf, ps, other]
Title: Design optimization of advanced tow-steered composites with manufacturing constraints
Comments: 29 pages, 16 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Tow steering technologies, such as Automated fiber placement, enable the fabrication of composite laminates with curvilinear fiber, tow, or tape paths. Designers may therefore tailor tow orientations locally according to the expected local stress state within a structure, such that strong and stiff orientations of the tow are (for example) optimized to provide maximal mechanical benefit. Tow path optimization can be an effective tool in automating this design process, yet has a tendency to create complex designs that may be challenging to manufacture. In the context of tow steering, these complexities can manifest in defects such as tow wrinkling, gaps, overlaps. In this work, we implement manufacturing constraints within the tow path optimization formulation to restrict the minimum tow turning radius and the maximum density of gaps between and overlaps of tows. This is achieved by bounding the local value of the curl and divergence of the vector field associated with the tow orientations. The resulting local constraints are effectively enforced in the optimization framework through the Augmented Lagrangian method. The resulting optimization methodology is demonstrated by designing 2D and 3D structures with optimized tow orientation paths that maximize stiffness (minimize compliance) considering various levels of manufacturing restrictions. The optimized tow paths are shown to be structurally efficient and to respect imposed manufacturing constraints. As expected, the more geometrical complexity that can be achieved by the feedstock tow and placement technology, the higher the stiffness of the resulting optimized design.

[310]  arXiv:2404.16651 [pdf, other]
Title: Evolutionary Large Language Models for Hardware Security: A Comparative Survey
Subjects: Cryptography and Security (cs.CR)

Automating hardware (HW) security vulnerability detection and mitigation during the design phase is imperative for two reasons: (i) It must be before chip fabrication, as post-fabrication fixes can be costly or even impractical; (ii) The size and complexity of modern HW raise concerns about unknown vulnerabilities compromising CIA triad. While Large Language Models (LLMs) can revolutionize both HW design and testing processes, within the semiconductor context, LLMs can be harnessed to automatically rectify security-relevant vulnerabilities inherent in HW designs. This study explores the seeds of LLM integration in register transfer level (RTL) designs, focusing on their capacity for autonomously resolving security-related vulnerabilities. The analysis involves comparing methodologies, assessing scalability, interpretability, and identifying future research directions. Potential areas for exploration include developing specialized LLM architectures for HW security tasks and enhancing model performance with domain-specific knowledge, leading to reliable automated security measurement and risk mitigation associated with HW vulnerabilities.

[311]  arXiv:2404.16653 [pdf, other]
Title: Análise de ambiguidade linguística em modelos de linguagem de grande escala (LLMs)
Comments: in Portuguese language, 16 p\'aginas, 5 p\'aginas de ap\^endice e 4 imagens
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Linguistic ambiguity continues to represent a significant challenge for natural language processing (NLP) systems, notwithstanding the advancements in architectures such as Transformers and BERT. Inspired by the recent success of instructional models like ChatGPT and Gemini (In 2023, the artificial intelligence was called Bard.), this study aims to analyze and discuss linguistic ambiguity within these models, focusing on three types prevalent in Brazilian Portuguese: semantic, syntactic, and lexical ambiguity. We create a corpus comprising 120 sentences, both ambiguous and unambiguous, for classification, explanation, and disambiguation. The models capability to generate ambiguous sentences was also explored by soliciting sets of sentences for each type of ambiguity. The results underwent qualitative analysis, drawing on recognized linguistic references, and quantitative assessment based on the accuracy of the responses obtained. It was evidenced that even the most sophisticated models, such as ChatGPT and Gemini, exhibit errors and deficiencies in their responses, with explanations often providing inconsistent. Furthermore, the accuracy peaked at 49.58 percent, indicating the need for descriptive studies for supervised learning.

[312]  arXiv:2404.16656 [pdf, other]
Title: A Self-Organizing Clustering System for Unsupervised Distribution Shift Detection
Comments: Accepted manuscript in the IEEE International Joint Conference of Neural Networks (IJCNN), 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Modeling non-stationary data is a challenging problem in the field of continual learning, and data distribution shifts may result in negative consequences on the performance of a machine learning model. Classic learning tools are often vulnerable to perturbations of the input covariates, and are sensitive to outliers and noise, and some tools are based on rigid algebraic assumptions. Distribution shifts are frequently occurring due to changes in raw materials for production, seasonality, a different user base, or even adversarial attacks. Therefore, there is a need for more effective distribution shift detection techniques.
In this work, we propose a continual learning framework for monitoring and detecting distribution changes. We explore the problem in a latent space generated by a bio-inspired self-organizing clustering and statistical aspects of the latent space. In particular, we investigate the projections made by two topology-preserving maps: the Self-Organizing Map and the Scale Invariant Map. Our method can be applied in both a supervised and an unsupervised context. We construct the assessment of changes in the data distribution as a comparison of Gaussian signals, making the proposed method fast and robust. We compare it to other unsupervised techniques, specifically Principal Component Analysis (PCA) and Kernel-PCA. Our comparison involves conducting experiments using sequences of images (based on MNIST and injected shifts with adversarial samples), chemical sensor measurements, and the environmental variable related to ozone levels. The empirical study reveals the potential of the proposed approach.

[313]  arXiv:2404.16659 [pdf, other]
Title: ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling
Comments: The 6th Clinical Natural Language Processing Workshop at NAACL 2024. Code is available at this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce an entropy-based method to identify and filter out unanswerable results. We further enhance result quality by filtering low-confidence SQL through log probability-based distribution, while grammatical and schema errors are mitigated by executing queries on the actual database. We experimentally verified that our method can filter unanswerable questions, which can be widely utilized even when the parameters of the model are not accessible, and that it can be effectively utilized in practice.

[314]  arXiv:2404.16660 [pdf, other]
Title: Benchmarking Mobile Device Control Agents across Diverse Configurations
Comments: Accepted (Spotlight) to ICLR 2024 Workshop on Generative Models for Decision Making. Project website: this https URL
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Developing autonomous agents for mobile devices can significantly enhance user interactions by offering increased efficiency and accessibility. However, despite the growing interest in mobile device control agents, the absence of a commonly adopted benchmark makes it challenging to quantify scientific progress in this area. In this work, we introduce B-MoCA: a novel benchmark designed specifically for evaluating mobile device control agents. To create a realistic benchmark, we develop B-MoCA based on the Android operating system and define 60 common daily tasks. Importantly, we incorporate a randomization feature that changes various aspects of mobile devices, including user interface layouts and language settings, to assess generalization performance. We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs as well as agents trained from scratch using human expert demonstrations. While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to enhance their effectiveness. Our source code is publicly available at https://b-moca.github.io.

[315]  arXiv:2404.16662 [pdf, other]
Title: Computing Hamiltonian Paths with Partial Order Restrictions
Subjects: Discrete Mathematics (cs.DM); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)

When solving the Hamiltonian path problem it seems natural to be given additional precedence constraints for the order in which the vertices are visited. For example one could decide whether a Hamiltonian path exists for a fixed starting point, or that some vertices are visited before another vertex. We consider the problem of finding a Hamiltonian path that observes all precedence constraints given in a partial order on the vertex set. We show that this problem is $\mathsf{NP}$-complete even if restricted to complete bipartite graphs and posets of height 2. In contrast, for posets of width $k$ there is an $\mathcal{O}(k^2 n^k)$ algorithm for arbitrary graphs with $n$ vertices. We show that it is unlikely that the running time of this algorithm can be improved significantly, i.e., there is no $f(k) n^{o(k)}$ time algorithm under the assumption of the Exponential Time Hypothesis. Furthermore, for the class of outerplanar graphs, we give an $\mathcal{O}(n^2)$ algorithm for arbitrary posets.

[316]  arXiv:2404.16663 [pdf, other]
Title: Formal Specification, Assessment, and Enforcement of Fairness for Generative AIs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Logic in Computer Science (cs.LO); Software Engineering (cs.SE)

The risk of reinforcing or exacerbating societal biases and inequalities is growing as generative AI increasingly produces content that resembles human output, from text to images and beyond. Here we formally characterize the notion of fairness for generative AI as a basis for monitoring and enforcing fairness. We define two levels of fairness utilizing the concept of infinite words. The first is the fairness demonstrated on the generated sequences, which is only evaluated on the outputs while agnostic to the prompts/models used. The second is the inherent fairness of the generative AI model, which requires that fairness be manifested when input prompts are neutral, that is, they do not explicitly instruct the generative AI to produce a particular type of output. We also study relative intersectional fairness to counteract the combinatorial explosion of fairness when considering multiple categories together with lazy fairness enforcement. Our implemented specification monitoring and enforcement tool shows interesting results when tested against several generative AI models.

[317]  arXiv:2404.16665 [pdf, other]
Title: A new way of deriving implicit Runge-Kutta methods based on repeated integrals
Comments: 34 pages, 7 figures
Subjects: Numerical Analysis (math.NA)

Runge-Kutta methods have an irreplaceable position among numerical methods designed to solve ordinary differential equations. Especially, implicit ones are suitable for approximating solutions of stiff initial value problems. We propose a new way of deriving coefficients of implicit Runge-Kutta methods. This approach based on repeated integrals yields both new and well-known Butcher's tableaux. We discuss the properties of newly derived methods and compare them with standard collocation implicit Runge-Kutta methods in a series of numerical experiments. In particular, we observe higher accuracy and higher experimental order of convergence of some newly derived methods.

[318]  arXiv:2404.16666 [pdf, other]
Title: PhyRecon: Physically Plausible Neural Scene Reconstruction
Comments: project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While neural implicit representations have gained popularity in multi-view 3D reconstruction, previous work struggles to yield physically plausible results, thereby limiting their applications in physics-demanding domains like embodied AI and robotics. The lack of plausibility originates from both the absence of physics modeling in the existing pipeline and their inability to recover intricate geometrical structures. In this paper, we introduce PhyRecon, which stands as the first approach to harness both differentiable rendering and differentiable physics simulation to learn implicit surface representations. Our framework proposes a novel differentiable particle-based physical simulator seamlessly integrated with the neural implicit representation. At its core is an efficient transformation between SDF-based implicit representation and explicit surface points by our proposed algorithm, Surface Points Marching Cubes (SP-MC), enabling differentiable learning with both rendering and physical losses. Moreover, we model both rendering and physical uncertainty to identify and compensate for the inconsistent and inaccurate monocular geometric priors. The physical uncertainty additionally enables a physics-guided pixel sampling to enhance the learning of slender structures. By amalgamating these techniques, our model facilitates efficient joint modeling with appearance, geometry, and physics. Extensive experiments demonstrate that PhyRecon significantly outperforms all state-of-the-art methods in terms of reconstruction quality. Our reconstruction results also yield superior physical stability, verified by Isaac Gym, with at least a 40% improvement across all datasets, opening broader avenues for future physics-based applications.

[319]  arXiv:2404.16670 [pdf, other]
Title: EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but is still unexplored in vision emotion understanding. In this work, we focus on enhancing the model's proficiency in understanding and adhering to instructions related to emotional contexts. Initially, we identify key visual clues critical to visual emotion recognition. Subsequently, we introduce a novel GPT-assisted pipeline for generating emotion visual instruction data, effectively addressing the scarcity of annotated instruction data in this domain. Expanding on the groundwork established by InstructBLIP, our proposed EmoVIT architecture incorporates emotion-specific instruction data, leveraging the powerful capabilities of Large Language Models to enhance performance. Through extensive experiments, our model showcases its proficiency in emotion classification, adeptness in affective reasoning, and competence in comprehending humor. The comparative analysis provides a robust benchmark for Emotion Visual Instruction Tuning in the era of LLMs, providing valuable insights and opening avenues for future exploration in this domain. Our code is available at \url{https://github.com/aimmemotion/EmoVIT}.

[320]  arXiv:2404.16672 [pdf, other]
Title: RUMOR: Reinforcement learning for Understanding a Model of the Real World for Navigation in Dynamic Environments
Subjects: Robotics (cs.RO)

Autonomous navigation in dynamic environments is a complex but essential task for autonomous robots, with recent deep reinforcement learning approaches showing promising results. However, the complexity of the real world makes it infeasible to train agents in every possible scenario configuration. Moreover, existing methods typically overlook factors such as robot kinodynamic constraints, or assume perfect knowledge of the environment. In this work, we present RUMOR, a novel planner for differential-drive robots that uses deep reinforcement learning to navigate in highly dynamic environments. Unlike other end-to-end DRL planners, it uses a descriptive robocentric velocity space model to extract the dynamic environment information, enhancing training effectiveness and scenario interpretation. Additionally, we propose an action space that inherently considers robot kinodynamics and train it in a simulator that reproduces the real world problematic aspects, reducing the gap between the reality and simulation. We extensively compare RUMOR with other state-of-the-art approaches, demonstrating a better performance, and provide a detailed analysis of the results. Finally, we validate RUMOR's performance in real-world settings by deploying it on a ground robot. Our experiments, conducted in crowded scenarios and unseen environments, confirm the algorithm's robustness and transferability.

[321]  arXiv:2404.16676 [pdf, ps, other]
Title: Multilayer Correlation Clustering
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

In this paper, we establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering (Bansal et al., FOCS '02) to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$. The goal is then to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the disagreements vector, which is defined as the vector (with dimension equal to the number of layers), each element of which represents the disagreements of the clustering on the corresponding layer. For this generalization, we first design an $O(L\log n)$-approximation algorithm, where $L$ is the number of layers, based on the well-known region growing technique. We then study an important special case of our problem, namely the problem with the probability constraint. For this case, we first give an $(\alpha+2)$-approximation algorithm, where $\alpha$ is any possible approximation ratio for the single-layer counterpart. For instance, we can take $\alpha=2.5$ in general (Ailon et al., JACM '08) and $\alpha=1.73+\epsilon$ for the unweighted case (Cohen-Addad et al., FOCS '23). Furthermore, we design a $4$-approximation algorithm, which improves the above approximation ratio of $\alpha+2=4.5$ for the general probability-constraint case. Computational experiments using real-world datasets demonstrate the effectiveness of our proposed algorithms.

[322]  arXiv:2404.16678 [pdf, other]
Title: Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausible semantics. To overcome the artifacts introduced by the diffusion prior, we apply the luminance conditional guidance. Moreover, we adopt multimodal high-level semantic priors to help the model understand the image content and deliver saturated colors. Besides, a luminance-aware decoder is designed to restore details and enhance overall visual quality. The proposed pipeline synthesizes saturated colors while maintaining plausible semantics. Experiments indicate that our proposed method considers both diversity and fidelity, surpassing previous methods in terms of perceptual realism and gain most human preference.

[323]  arXiv:2404.16684 [pdf, ps, other]
Title: Monolithic two-level Schwarz preconditioner for Biot's consolidation model in two space dimensions
Comments: 33 pages
Subjects: Numerical Analysis (math.NA)

This paper addresses the construction and analysis of a class of domain decomposition methods for the iterative solution of the quasi-static Biot problem in three-field formulation. The considered discrete model arises from time discretization by the implicit Euler method and space discretization by a family of strongly mass-conserving methods exploiting $H^{div}$-conforming approximations of the solid displacement and fluid flux fields. For the resulting saddle-point problem, we construct monolithic overlapping domain decomposition (DD) methods whose analysis relies on a transformation into an equivalent symmetric positive definite system and on stable decompositions of the involved finite element spaces under proper problem-dependent norms. Numerical results on two-dimensional test problems are in accordance with the provided theoretical uniform convergence estimates for the two-level multiplicative Schwarz method.

[324]  arXiv:2404.16685 [pdf, other]
Title: Multi-scale HSV Color Feature Embedding for High-fidelity NIR-to-RGB Spectrum Translation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

The NIR-to-RGB spectral domain translation is a formidable task due to the inherent spectral mapping ambiguities within NIR inputs and RGB outputs. Thus, existing methods fail to reconcile the tension between maintaining texture detail fidelity and achieving diverse color variations. In this paper, we propose a Multi-scale HSV Color Feature Embedding Network (MCFNet) that decomposes the mapping process into three sub-tasks, including NIR texture maintenance, coarse geometry reconstruction, and RGB color prediction. Thus, we propose three key modules for each corresponding sub-task: the Texture Preserving Block (TPB), the HSV Color Feature Embedding Module (HSV-CFEM), and the Geometry Reconstruction Module (GRM). These modules contribute to our MCFNet methodically tackling spectral translation through a series of escalating resolutions, progressively enriching images with color and texture fidelity in a scale-coherent fashion. The proposed MCFNet demonstrates substantial performance gains over the NIR image colorization task. Code is released at: https://github.com/AlexYangxx/MCFNet.

[325]  arXiv:2404.16687 [pdf, other]
Title: NTIRE 2024 Quality Assessment of AI-Generated Content Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC.

[326]  arXiv:2404.16688 [pdf, other]
Title: Reusing Deep Learning Models: Challenges and Directions in Software Engineering
Comments: Proceedings of the IEEE John Vincent Atanasoff Symposium on Modern Computing (JVA'23) 2023
Subjects: Software Engineering (cs.SE)

Deep neural networks (DNNs) achieve state-of-the-art performance in many areas, including computer vision, system configuration, and question-answering. However, DNNs are expensive to develop, both in intellectual effort (e.g., devising new architectures) and computational costs (e.g., training). Reusing DNNs is a promising direction to amortize costs within a company and across the computing industry. As with any new technology, however, there are many challenges in reusing DNNs. These challenges include both missing technical capabilities and missing engineering practices.
This vision paper describes challenges in current approaches to DNN re-use. We summarize studies of re-use failures across the spectrum of re-use techniques, including conceptual (e.g., reusing based on a research paper), adaptation (e.g., re-using by building on an existing implementation), and deployment (e.g., direct re-use on a new device). We outline possible advances that would improve each kind of re-use.

[327]  arXiv:2404.16689 [pdf, other]
Title: Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents
Subjects: Artificial Intelligence (cs.AI)

While Poker, as a family of games, has been studied extensively in the last decades, collectible card games have seen relatively little attention. Only recently have we seen an agent that can compete with professional human players in Hearthstone, one of the most popular collectible card games. Although artificial agents must be able to work with imperfect information in both of these genres, collectible card games pose another set of distinct challenges. Unlike in many poker variants, agents must deal with state space so vast that even enumerating all states consistent with the agent's beliefs is intractable, rendering the current search methods unusable and requiring the agents to opt for other techniques. In this paper, we investigate the strength of such techniques for this class of games. Namely, we present preliminary analysis results of ByteRL, the state-of-the-art agent in Legends of Code and Magic and Hearthstone. Although ByteRL beat a top-10 Hearthstone player from China, we show that its play in Legends of Code and Magic is highly exploitable.

[328]  arXiv:2404.16692 [pdf, ps, other]
Title: Influence of Solution Efficiency and Valence of Instruction on Additive and Subtractive Solution Strategies in Humans and GPT-4
Comments: 29 pages, 2 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We explored the addition bias, a cognitive tendency to prefer adding elements over removing them to alter an initial state or structure, by conducting four preregistered experiments examining the problem-solving behavior of both humans and OpenAl's GPT-4 large language model. The experiments involved 588 participants from the U.S. and 680 iterations of the GPT-4 model. The problem-solving task was either to create symmetry within a grid (Experiments 1 and 3) or to edit a summary (Experiments 2 and 4). As hypothesized, we found that overall, the addition bias was present. Solution efficiency (Experiments 1 and 2) and valence of the instruction (Experiments 3 and 4) played important roles. Human participants were less likely to use additive strategies when subtraction was relatively more efficient than when addition and subtraction were equally efficient. GPT-4 exhibited the opposite behavior, with a strong addition bias when subtraction was more efficient. In terms of instruction valence, GPT-4 was more likely to add words when asked to "improve" compared to "edit", whereas humans did not show this effect. When we looked at the addition bias under different conditions, we found more biased responses for GPT-4 compared to humans. Our findings highlight the importance of considering comparable and sometimes superior subtractive alternatives, as well as reevaluating one's own and particularly the language models' problem-solving behavior.

[329]  arXiv:2404.16694 [pdf, ps, other]
Title: A non-separable progressive multivariate WENO-$2r$ point value
Subjects: Numerical Analysis (math.NA)

The weighted essentially non-oscillatory {technique} using a stencil of $2r$ points (WENO-$2r$) is an interpolatory method that consists in obtaining a higher approximation order from the non-linear combination of interpolants of $r+1$ nodes. The result is an interpolant of order $2r$ at the smooth parts and order $r+1$ when an isolated discontinuity falls at any grid interval of the large stencil except at the central one. Recently, a new WENO method based on Aitken-Neville's algorithm has been designed for interpolation of equally spaced data at the mid-points and presents progressive order of accuracy close to discontinuities. This paper is devoted to constructing a general progressive WENO method for non-necessarily uniformly spaced data and several variables interpolating in any point of the central interval. Also, we provide explicit formulas for linear and non-linear weights and prove the order obtained. Finally, some numerical experiments are presented to check the theoretical results.

[330]  arXiv:2404.16695 [pdf, other]
Title: Kernelization Dichotomies for Hitting Subgraphs under Structural Parameterizations
Comments: 58 pages, 7 figures
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Combinatorics (math.CO)

For a fixed graph $H$, the $H$-SUBGRAPH HITTING problem consists in deleting the minimum number of vertices from an input graph to obtain a graph without any occurrence of $H$ as a subgraph. This problem can be seen as a generalization of VERTEX COVER, which corresponds to the case $H = K_2$. We initiate a study of $H$-SUBGRAPH HITTING from the point of view of characterizing structural parameterizations that allow for polynomial kernels, within the recently active framework of taking as the parameter the number of vertex deletions to obtain a graph in a "simple" class $C$. Our main contribution is to identify graph parameters that, when $H$-SUBGRAPH HITTING is parameterized by the vertex-deletion distance to a class $C$ where any of these parameters is bounded, and assuming standard complexity assumptions and that $H$ is biconnected, allow us to prove the following sharp dichotomy: the problem admits a polynomial kernel if and only if $H$ is a clique. These new graph parameters are inspired by the notion of $C$-elimination distance introduced by Bulian and Dawar [Algorithmica 2016], and generalize it in two directions. Our results also apply to the version of the problem where one wants to hit $H$ as an induced subgraph, and imply in particular, that the problems of hitting minors and hitting (induced) subgraphs have a substantially different behavior with respect to the existence of polynomial kernels under structural parameterizations.

[331]  arXiv:2404.16698 [pdf, other]
Title: Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents
Subjects: Computation and Language (cs.CL)

In the rapidly evolving field of artificial intelligence, ensuring safe decision-making of Large Language Models (LLMs) is a significant challenge. This paper introduces Governance of the Commons Simulation (GovSim), a simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. Through this simulation environment, we explore the dynamics of resource sharing among AI agents, highlighting the importance of ethical considerations, strategic planning, and negotiation skills. GovSim is versatile and supports any text-based agent, including LLMs agents. Using the Generative Agent framework, we create a standard agent that facilitates the integration of different LLMs. Our findings reveal that within GovSim, only two out of 15 tested LLMs managed to achieve a sustainable outcome, indicating a significant gap in the ability of models to manage shared resources. Furthermore, we find that by removing the ability of agents to communicate, they overuse the shared resource, highlighting the importance of communication for cooperation. Interestingly, most LLMs lack the ability to make universalized hypotheses, which highlights a significant weakness in their reasoning skills. We open source the full suite of our research results, including the simulation environment, agent prompts, and a comprehensive web interface.

[332]  arXiv:2404.16699 [pdf, other]
Title: Generalized cyclic symmetric decompositions for the matrix multiplication tensor
Comments: arXiv admin note: text overlap with arXiv:2310.02794
Subjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)

In this paper we propose a new generalized cyclic symmetric structure in the factor matrices of polyadic decompositions of matrix multiplication tensors for non-square matrix multiplication to reduce the number of variables in the optimization problem and in this way improve the convergence.

[333]  arXiv:2404.16701 [pdf, ps, other]
Title: On the Streaming Complexity of Expander Decomposition
Subjects: Data Structures and Algorithms (cs.DS)

In this paper we study the problem of finding $(\epsilon, \phi)$-expander decompositions of a graph in the streaming model, in particular for dynamic streams of edge insertions and deletions. The goal is to partition the vertex set so that every component induces a $\phi$-expander, while the number of inter-cluster edges is only an $\epsilon$ fraction of the total volume. It was recently shown that there exists a simple algorithm to construct a $(O(\phi \log n), \phi)$-expander decomposition of an $n$-vertex graph using $\widetilde{O}(n/\phi^2)$ bits of space [Filtser, Kapralov, Makarov, ITCS'23]. This result calls for understanding the extent to which a dependence in space on the sparsity parameter $\phi$ is inherent. We move towards answering this question on two fronts. We prove that a $(O(\phi \log n), \phi)$-expander decomposition can be found using $\widetilde{O}(n)$ space, for every $\phi$. At the core of our result is the first streaming algorithm for computing boundary-linked expander decompositions, a recently introduced strengthening of the classical notion [Goranci et al., SODA'21]. The key advantage is that a classical sparsifier [Fung et al., STOC'11], with size independent of $\phi$, preserves the cuts inside the clusters of a boundary-linked expander decomposition within a multiplicative error. Notable algorithmic applications use sequences of expander decompositions, in particular one often repeatedly computes a decomposition of the subgraph induced by the inter-cluster edges (e.g., the seminal work of Spielman and Teng on spectral sparsifiers [Spielman, Teng, SIAM Journal of Computing 40(4)], or the recent maximum flow breakthrough [Chen et al., FOCS'22], among others). We prove that any streaming algorithm that computes a sequence of $(O(\phi \log n), \phi)$-expander decompositions requires ${\widetilde{\Omega}}(n/\phi)$ bits of space, even in insertion only streams.

[334]  arXiv:2404.16705 [pdf, other]
Title: SHINE: Social Homology Identification for Navigation in Crowded Environments
Subjects: Robotics (cs.RO)

Navigating mobile robots in social environments remains a challenging task due to the intricacies of human-robot interactions. Most of the motion planners designed for crowded and dynamic environments focus on choosing the best velocity to reach the goal while avoiding collisions, but do not explicitly consider the high-level navigation behavior (avoiding through the left or right side, letting others pass or passing before others, etc.). In this work, we present a novel motion planner that incorporates topology distinct paths representing diverse navigation strategies around humans. The planner selects the topology class that imitates human behavior the best using a deep neural network model trained on real-world human motion data, ensuring socially intelligent and contextually aware navigation. Our system refines the chosen path through an optimization-based local planner in real time, ensuring seamless adherence to desired social behaviors. In this way, we decouple perception and local planning from the decision-making process. We evaluate the prediction accuracy of the network with real-world data. In addition, we assess the navigation capabilities in both simulation and a real-world platform, comparing it with other state-of-the-art planners. We demonstrate that our planner exhibits socially desirable behaviors and shows a smooth and remarkable performance.

[335]  arXiv:2404.16706 [pdf, other]
Title: Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differentially private continual counting are either inefficient in terms of their space usage or add an excessive amount of noise, inducing suboptimal utility.
The most practical DP continual counting algorithms add carefully correlated Gaussian noise to the values. The task of choosing the covariance for this noise can be expressed in terms of factoring the lower-triangular matrix of ones (which computes prefix sums). We present two approaches from this class (for different parameter regimes) that achieve near-optimal utility for DP continual counting and only require logarithmic or polylogarithmic space (and time).
Our first approach is based on a space-efficient streaming matrix multiplication algorithm for a class of Toeplitz matrices. We show that to instantiate this algorithm for DP continual counting, it is sufficient to find a low-degree rational function that approximates the square root on a circle in the complex plane. We then apply and extend tools from approximation theory to achieve this. We also derive efficient closed-forms for the objective function for arbitrarily many steps, and show direct numerical optimization yields a highly practical solution to the problem. Our second approach combines our first approach with a recursive construction similar to the binary tree mechanism.

[336]  arXiv:2404.16710 [pdf, other]
Title: Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
Comments: Code open sourcing is in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task.

[337]  arXiv:2404.16717 [pdf, other]
Title: Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class
Comments: Accepted to FAccT 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms -- from diced to whole, on a table or in a bowl -- yet standard VLM classifiers map all instances of a class to a \it{single vector based on the class label}. We argue that to represent this rich diversity within a class, zero-shot classification should move beyond a single vector. We propose a method to encode and account for diversity within a class using inferred attributes, still in the zero-shot setting without retraining. We find our method consistently outperforms standard zero-shot classification over a large suite of datasets encompassing hierarchies, diverse object states, and real-world geographic diversity, as well finer-grained datasets where intra-class diversity may be less prevalent. Importantly, our method is inherently interpretable, offering faithful explanations for each inference to facilitate model debugging and enhance transparency. We also find our method scales efficiently to a large number of attributes to account for diversity -- leading to more accurate predictions for atypical instances. Finally, we characterize a principled trade-off between overall and worst class accuracy, which can be tuned via a hyperparameter of our method. We hope this work spurs further research into the promise of zero-shot classification beyond a single class vector for capturing diversity in the world, and building transparent AI systems without compromising performance.

[338]  arXiv:2404.16721 [pdf, other]
Title: Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods
Comments: 7 pages, 4 figures, double blind under review
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information. Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency. The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points.

[339]  arXiv:2404.16722 [pdf, other]
Title: Clique Is Hard on Average for Sherali-Adams with Bounded Coefficients
Comments: This is the full-length version of a paper with the title "Clique Is Hard on Average for Unary Sherali-Adams" that appeared in the Proceedings of the 64th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2023)
Subjects: Computational Complexity (cs.CC)

We prove that Sherali-Adams with polynomially bounded coefficients requires proofs of size $n^{\Omega(d)}$ to rule out the existence of an $n^{\Theta(1)}$-clique in Erd\H{o}s-R\'{e}nyi random graphs whose maximum clique is of size $d\leq 2\log n$. This lower bound is tight up to the multiplicative constant in the exponent. We obtain this result by introducing a technique inspired by pseudo-calibration which may be of independent interest. The technique involves defining a measure on monomials that precisely captures the contribution of a monomial to a refutation. This measure intuitively captures progress and should have further applications in proof complexity.

[340]  arXiv:2404.16723 [pdf, other]
Title: Constrained Level Planarity is FPT with Respect to the Vertex Cover Number
Comments: Extended version of a paper to appear in the proceedings of the 51st EATCS International Colloquium on Automata, Languages and Programming (ICALP 2024)
Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG)

The problem Level Planarity asks for a crossing-free drawing of a graph in the plane such that vertices are placed at prescribed y-coordinates (called levels) and such that every edge is realized as a y-monotone curve. In the variant Constrained Level Planarity, each level y is equipped with a partial order <_y on its vertices and in the desired drawing the left-to-right order of vertices on level y has to be a linear extension of <_y. Constrained Level Planarity is known to be a remarkably difficult problem: previous results by Klemz and Rote [ACM Trans. Alg. 2019] and by Br\"uckner and Rutter [SODA 2017] imply that it remains NP-hard even when restricted to graphs whose tree-depth and feedback vertex set number are bounded by a constant and even when the instances are additionally required to be either proper, meaning that each edge spans two consecutive levels, or ordered, meaning that all given partial orders are total orders. In particular, these results rule out the existence of FPT-time (even XP-time) algorithms with respect to these and related graph parameters (unless P=NP). However, the parameterized complexity of Constrained Level Planarity with respect to the vertex cover number of the input graph remained open.
In this paper, we show that Constrained Level Planarity can be solved in FPT-time when parameterized by the vertex cover number. In view of the previous intractability statements, our result is best-possible in several regards: a speed-up to polynomial time or a generalization to the aforementioned smaller graph parameters is not possible, even if restricting to proper or ordered instances.

[341]  arXiv:2404.16724 [pdf, other]
Title: Tverberg's theorem and multi-class support vector machines
Authors: Pablo Soberón
Comments: 15 pages, 3 figures
Subjects: Machine Learning (cs.LG)

We show how, using linear-algebraic tools developed to prove Tverberg's theorem in combinatorial geometry, we can design new models of multi-class support vector machines (SVMs). These supervised learning protocols require fewer conditions to classify sets of points, and can be computed using existing binary SVM algorithms in higher-dimensional spaces, including soft-margin SVM algorithms. We describe how the theoretical guarantees of standard support vector machines transfer to these new classes of multi-class support vector machines. We give a new simple proof of a geometric characterization of support vectors for largest margin SVMs by Veelaert.

[342]  arXiv:2404.16725 [pdf, ps, other]
Title: Approximation Algorithms for Hop Constrained and Buy-at-Bulk Network Design via Hop Constrained Oblivious Routing
Subjects: Data Structures and Algorithms (cs.DS)

We consider two-cost network design models in which edges of the input graph have an associated cost and length. We build upon recent advances in hop-constrained oblivious routing to obtain two sets of results.
We address multicommodity buy-at-bulk network design in the nonuniform setting. Existing poly-logarithmic approximations are based on the junction tree approach [CHKS09,KN11]. We obtain a new polylogarithmic approximation via a natural LP relaxation. This establishes an upper bound on its integrality gap and affirmatively answers an open question raised in [CHKS09]. The rounding is based on recent results in hop-constrained oblivious routing [GHZ21], and this technique yields a polylogarithmic approximation in more general settings such as set connectivity. Our algorithm for buy-at-bulk network design is based on an LP-based reduction to hop constrained network design for which we obtain LP-based bicriteria approximation algorithms.
We also consider a fault-tolerant version of hop constrained network design where one wants to design a low-cost network to guarantee short paths between a given set of source-sink pairs even when k-1 edges can fail. This model has been considered in network design [GL17,GML18,AJL20] but no approximation algorithms were known. We obtain polylogarithmic bicriteria approximation algorithms for the single-source setting for any fixed k. We build upon the single-source algorithm and the junction-tree approach to obtain an approximation algorithm for the multicommodity setting when at most one edge can fail.

[343]  arXiv:2404.16726 [pdf, other]
Title: History repeats itself: A Baseline for Temporal Knowledge Graph Forecasting
Comments: Accepted at IJCAI 2024
Subjects: Machine Learning (cs.LG)

Temporal Knowledge Graph (TKG) Forecasting aims at predicting links in Knowledge Graphs for future timesteps based on a history of Knowledge Graphs. To this day, standardized evaluation protocols and rigorous comparison across TKG models are available, but the importance of simple baselines is often neglected in the evaluation, which prevents researchers from discerning actual and fictitious progress. We propose to close this gap by designing an intuitive baseline for TKG Forecasting based on predicting recurring facts. Compared to most TKG models, it requires little hyperparameter tuning and no iterative training. Further, it can help to identify failure modes in existing approaches. The empirical findings are quite unexpected: compared to 11 methods on five datasets, our baseline ranks first or third in three of them, painting a radically different picture of the predictive quality of the state of the art.

[344]  arXiv:2404.16727 [pdf, other]
Title: Learning-Based Efficient Approximation of Data-enabled Predictive Control
Subjects: Systems and Control (eess.SY)

Data-Enabled Predictive Control (DeePC) bypasses the need for system identification by directly leveraging raw data to formulate optimal control policies. However, the size of the optimization problem in DeePC grows linearly with respect to the data size, which prohibits its application due to high computational costs. In this paper, we propose an efficient approximation of DeePC, whose size is invariant with respect to the amount of data collected, via differentiable convex programming. Specifically, the optimization problem in DeePC is decomposed into two parts: a control objective and a scoring function that evaluates the likelihood of a guessed I/O sequence, the latter of which is approximated with a size-invariant learned optimization problem. The proposed method is validated through numerical simulations on a quadruple tank system, illustrating that the learned controller can reduce the computational time of DeePC by 5x while maintaining its control performance.

[345]  arXiv:2404.16730 [pdf, other]
Title: Finch: Sparse and Structured Array Programming with Control Flow
Subjects: Mathematical Software (cs.MS)

From FORTRAN to NumPy, arrays have revolutionized how we express computation. However, arrays in these, and almost all prominent systems, can only handle dense rectilinear integer grids. Real world arrays often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Support for structured data is fragmented and incomplete. Existing frameworks limit the array structures and program control flow they support to better simplify the problem.
In this work, we propose a new programming language, Finch, which supports both flexible control flow and diverse data structures. Finch facilitates a programming model which resolves the challenges of computing over structured arrays by combining control flow and data structures into a common representation where they can be co-optimized. Finch automatically specializes control flow to data so that performance engineers can focus on experimenting with many algorithms. Finch supports a familiar programming language of loops, statements, ifs, breaks, etc., over a wide variety of array structures, such as sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch reliably utilizes the key properties of structure, such as structural zeros, repeated values, or clustered non-zeros. We show that this leads to dramatic speedups in operations such as SpMV and SpGEMM, image processing, graph analytics, and a high-level tensor operator fusion interface.

[346]  arXiv:2404.16734 [pdf, ps, other]
Title: Uniform Substitution for Differential Refinement Logic
Subjects: Logic in Computer Science (cs.LO)

This paper introduces a uniform substitution calculus for differential refinement logic dRL. The logic dRL extends the differential dynamic logic dL such that one can simultaneously reason about properties of and relations between hybrid systems. Refinements is useful e.g. for simplifying proofs by relating a concrete hybrid system to an abstract one from which the property can be proved more easily. Uniform substitution is the key to parsimonious prover microkernels. It enables the verbatim use of single axiom formulas instead of axiom schemata with soundness-critical side conditions scattered across the proof calculus. The uniform substitution rule can then be used to instantiate all axioms soundly. Access to differential variables in dRL enables more control over the notion of refinement, which is shown to be decidable on a fragment of hybrid programs.

[347]  arXiv:2404.16737 [pdf, ps, other]
Title: Open Source Software (OSS) Transparency for DoD Acquisition
Comments: Naval Post-graduate School, Monterey, CA, US, May 8-9 2024
Subjects: Software Engineering (cs.SE)

Caveat emptor, or let the buyer beware, is commonly attributed to open source software (OSS)-the onus is on the OSS consumer to ensure that it is fit for use in the consumer's context. OSS has been compared to an open market bazaar where consumers are free to browse all the source code and take a copy. In this paper, we observe challenges for the OSS consumer to obtain information about the process(es), project(s) used to produce a product and the protection(s) employed by those projects. We discuss the need for more transparency by OSS projects, where possible and introduce a framework for reasoning about those OSS projects and their products for use by the OSS consumer.

[348]  arXiv:2404.16739 [pdf, ps, other]
Title: CBRW: A Novel Approach for Cancelable Biometric Template Generation based on
Authors: Nitin Kumar, Manisha
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cancelable Biometric is a challenging research field in which security of an original biometric image is ensured by transforming the original biometric into another irreversible domain. Several approaches have been suggested in literature for generating cancelable biometric templates. In this paper, two novel and simple cancelable biometric template generation methods based on Random Walk (CBRW) have been proposed. By employing random walk and other steps given in the proposed two algorithms viz. CBRW-BitXOR and CBRW-BitCMP, the original biometric is transformed into a cancellable template. The performance of the proposed methods is compared with other state-of-the-art methods. Experiments have been performed on eight publicly available gray and color datasets i.e. CP (ear) (gray and color), UTIRIS (iris) (gray and color), ORL (face) (gray), IIT Delhi (iris) (gray and color), and AR (face) (color). Performance of the generated templates is measured in terms of Correlation Coefficient (Cr), Root Mean Square Error (RMSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM), Mean Absolute Error (MAE), Number of Pixel Change Rate (NPCR), and Unified Average Changing Intensity (UACI). By experimental results, it has been proved that proposed methods are superior than other state-of-the-art methods in qualitative as well as quantitative analysis. Furthermore, CBRW performs better on both gray as well as color images.

[349]  arXiv:2404.16741 [pdf, other]
Title: Parameterized Complexity of Efficient Sortation
Subjects: Data Structures and Algorithms (cs.DS)

A crucial challenge arising in the design of large-scale logistical networks is to optimize parcel sortation for routing. We study this problem under the recent graph-theoretic formalization of Van Dyk, Klause, Koenemann and Megow (IPCO 2024). The problem asks - given an input digraph D (the fulfillment network) together with a set of commodities represented as source-sink tuples - for a minimum-outdegree subgraph H of the transitive closure of D that contains a source-sink route for each of the commodities. Given the underlying motivation, we study two variants of the problem which differ in whether the routes for the commodities are assumed to be given, or can be chosen arbitrarily.
We perform a thorough parameterized analysis of the complexity of both problems. Our results concentrate on three fundamental parameterizations of the problem: (1) When attempting to parameterize by the target outdegree of H, we show that the problems are paraNP-hard even in highly restricted cases; (2) When parameterizing by the number of commodities, we utilize Ramsey-type arguments, kernelization and treewidth reduction techniques to obtain parameterized algorithms for both problems; (3) When parameterizing by the structure of D, we establish fixed-parameter tractability for both problems w.r.t. treewidth, maximum degree and the maximum routing length. We combine this with lower bounds which show that omitting any of the three parameters results in paraNP-hardness.

[350]  arXiv:2404.16743 [pdf, other]
Title: Automatic Speech Recognition System-Independent Word Error Rate Estimatio
Comments: Accepted to LREC-COLING 2024 (long)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER estimators on in-domain data and achieves state-of-the-art performance on out-of-domain data. On the out-of-domain data, the SIWE model outperformed the baseline estimators in root mean square error and Pearson correlation coefficient by relative 17.58% and 18.21%, respectively, on Switchboard and CALLHOME. The performance was further improved when the WER of the training set was close to the WER of the evaluation dataset.

[351]  arXiv:2404.16744 [pdf, ps, other]
Title: JITScanner: Just-in-Time Executable Page Check in the Linux Operating System
Subjects: Cryptography and Security (cs.CR)

Modern malware poses a severe threat to cybersecurity, continually evolving in sophistication. To combat this threat, researchers and security professionals continuously explore advanced techniques for malware detection and analysis. Dynamic analysis, a prevalent approach, offers advantages over static analysis by enabling observation of runtime behavior and detecting obfuscated or encrypted code used to evade detection. However, executing programs within a controlled environment can be resource-intensive, often necessitating compromises, such as limiting sandboxing to an initial period. In our article, we propose an alternative method for dynamic executable analysis: examining the presence of malicious signatures within executable virtual pages precisely when their current content, including any updates over time, is accessed for instruction fetching. Our solution, named JITScanner, is developed as a Linux-oriented package built upon a Loadable Kernel Module (LKM). It integrates a user-level component that communicates efficiently with the LKM using scalable multi-processor/core technology. JITScanner's effectiveness in detecting malware programs and its minimal intrusion in normal runtime scenarios have been extensively tested, with the experiment results detailed in this article. These experiments affirm the viability of our approach, showcasing JITScanner's capability to effectively identify malware while minimizing runtime overhead.

[352]  arXiv:2404.16748 [pdf, other]
Title: TELA: Text to Layer-wise 3D Clothed Human Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed human representation combined with a progressive optimization strategy, which produces clothing-disentangled 3D human models while providing control capacity for the generation process. The basic idea is progressively generating a minimal-clothed human body and layer-wise clothes. During clothing generation, a novel stratified compositional rendering method is proposed to fuse multi-layer human models, and a new loss function is utilized to help decouple the clothing model from the human body. The proposed method achieves high-quality disentanglement, which thereby provides an effective way for 3D garment generation. Extensive experiments demonstrate that our approach achieves state-of-the-art 3D clothed human generation while also supporting cloth editing applications such as virtual try-on. Project page: this http URL

[353]  arXiv:2404.16752 [pdf, other]
Title: TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy. The current best methods leverage large datasets of 3D pseudo-ground-truth (p-GT) and 2D keypoints, leading to robust performance. With such methods, we observe a paradoxical decline in 3D pose accuracy with increasing 2D accuracy. This is caused by biases in the p-GT and the use of an approximate camera projection model. We quantify the error induced by current camera models and show that fitting 2D keypoints and p-GT accurately causes incorrect 3D poses. Our analysis defines the invalid distances within which minimizing 2D and p-GT losses is detrimental. We use this to formulate a new loss Threshold-Adaptive Loss Scaling (TALS) that penalizes gross 2D and p-GT losses but not smaller ones. With such a loss, there are many 3D poses that could equally explain the 2D evidence. To reduce this ambiguity we need a prior over valid human poses but such priors can introduce unwanted bias. To address this, we exploit a tokenized representation of human pose and reformulate the problem as token prediction. This restricts the estimated poses to the space of valid poses, effectively providing a uniform prior. Extensive experiments on the EMDB and 3DPW datasets show that our reformulated keypoint loss and tokenization allows us to train on in-the-wild data while improving 3D accuracy over the state-of-the-art. Our models and code are available for research at https://tokenhmr.is.tue.mpg.de.

[354]  arXiv:2404.16754 [pdf, other]
Title: RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.

[355]  arXiv:2404.16764 [pdf, other]
Title: Dataset of Quotation Attribution in German News Articles
Comments: To be published at LREC-COLING 2024
Subjects: Computation and Language (cs.CL)

Extracting who says what to whom is a crucial part in analyzing human communication in today's abundance of data such as online news articles. Yet, the lack of annotated data for this task in German news articles severely limits the quality and usability of possible systems. To remedy this, we present a new, freely available, creative-commons-licensed dataset for quotation attribution in German news articles based on WIKINEWS. The dataset provides curated, high-quality annotations across 1000 documents (250,000 tokens) in a fine-grained annotation schema enabling various downstream uses for the dataset. The annotations not only specify who said what but also how, in which context, to whom and define the type of quotation. We specify our annotation schema, describe the creation of the dataset and provide a quantitative analysis. Further, we describe suitable evaluation metrics, apply two existing systems for quotation attribution, discuss their results to evaluate the utility of our dataset and outline use cases of our dataset in downstream tasks.

[356]  arXiv:2404.16766 [pdf, other]
Title: Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

While supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences, concerns have been raised about the depth of this alignment, with some critiques suggesting it is merely "superficial". We critically examine this hypothesis within the scope of cross-lingual generation tasks, proposing that the effectiveness of SFT may be constrained by its reliance on prior tokens to guide cross-lingual generation. Based on this crucial insight, and in response to the challenges posed by the costly and limited availability of non-English data for SFT, we introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens to bridge the foundation LLM and the SFT LLM, achieving comparable performance without training. Experiments on machine translation and part-of-speech tagging across eight languages demonstrate the efficacy of PreTTY in cross-lingual settings. Remarkably, by initiating the decoding process with only one or two prior tokens, foundation LLMs can achieve performance comparable to their SFT counterparts. This method presents a cost-effective alternative to SFT and advances the democratization of multilingual LLMs.

[357]  arXiv:2404.16767 [pdf, other]
Title: REBEL: Reinforcement Learning via Regressing Relative Rewards
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping) and is notorious for its sensitivity to the precise implementation of these components. In response, we take a step back and ask what a minimalist RL algorithm for the era of generative models would look like. We propose REBEL, an algorithm that cleanly reduces the problem of policy optimization to regressing the relative rewards via a direct policy parameterization between two completions to a prompt, enabling strikingly lightweight implementation. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL, which allows us to match the strongest known theoretical guarantees in terms of convergence and sample complexity in the RL literature. REBEL can also cleanly incorporate offline data and handle the intransitive preferences we frequently see in practice. Empirically, we find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO, all while being simpler to implement and more computationally tractable than PPO.

[358]  arXiv:2404.16768 [pdf, ps, other]
Title: Redefining Safety for Autonomous Vehicles
Comments: 18 pages, SafeComp 2024 draft preprint
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Existing definitions and associated conceptual frameworks for computer-based system safety should be revisited in light of real-world experiences from deploying autonomous vehicles. Current terminology used by industry safety standards emphasizes mitigation of risk from specifically identified hazards, and carries assumptions based on human-supervised vehicle operation. Operation without a human driver dramatically increases the scope of safety concerns, especially due to operation in an open world environment, a requirement to self-enforce operational limits, participation in an ad hoc sociotechnical system of systems, and a requirement to conform to both legal and ethical constraints. Existing standards and terminology only partially address these new challenges. We propose updated definitions for core system safety concepts that encompass these additional considerations as a starting point for evolving safe-ty approaches to address these additional safety challenges. These results might additionally inform framing safety terminology for other autonomous system applications.

[359]  arXiv:2404.16771 [pdf, other]
Title: ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions. To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. Furthermore, while ConsistentID introduces more multimodal ID information, it maintains a fast inference speed during generation.

[360]  arXiv:2404.16773 [pdf, other]
Title: ConKeD++ -- Improving descriptor learning for retinal image registration: A comprehensive study of contrastive losses
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Self-supervised contrastive learning has emerged as one of the most successful deep learning paradigms. In this regard, it has seen extensive use in image registration and, more recently, in the particular field of medical image registration. In this work, we propose to test and extend and improve a state-of-the-art framework for color fundus image registration, ConKeD. Using the ConKeD framework we test multiple loss functions, adapting them to the framework and the application domain. Furthermore, we evaluate our models using the standarized benchmark dataset FIRE as well as several datasets that have never been used before for color fundus registration, for which we are releasing the pairing data as well as a standardized evaluation approach. Our work demonstrates state-of-the-art performance across all datasets and metrics demonstrating several advantages over current SOTA color fundus registration methods

[361]  arXiv:2404.16776 [pdf, other]
Title: Modeling Selective Feature Attention for Representation-based Siamese Text Matching
Comments: Accepted by IJCAI2024
Subjects: Computation and Language (cs.CL)

Representation-based Siamese networks have risen to popularity in lightweight text matching due to their low deployment and inference costs. While word-level attention mechanisms have been implemented within Siamese networks to improve performance, we propose Feature Attention (FA), a novel downstream block designed to enrich the modeling of dependencies among embedding features. Employing "squeeze-and-excitation" techniques, the FA block dynamically adjusts the emphasis on individual features, enabling the network to concentrate more on features that significantly contribute to the final classification. Building upon FA, we introduce a dynamic "selection" mechanism called Selective Feature Attention (SFA), which leverages a stacked BiGRU Inception structure. The SFA block facilitates multi-scale semantic extraction by traversing different stacked BiGRU layers, encouraging the network to selectively concentrate on semantic information and embedding features across varying levels of abstraction. Both the FA and SFA blocks offer a seamless integration capability with various Siamese networks, showcasing a plug-and-play characteristic. Experimental evaluations conducted across diverse text matching baselines and benchmarks underscore the indispensability of modeling feature attention and the superiority of the "selection" mechanism.

[362]  arXiv:2404.16778 [pdf, other]
Title: Unifying Asynchronous Logics for Hyperproperties
Subjects: Logic in Computer Science (cs.LO)

We introduce and investigate a powerful hyper logical framework in the linear-time setting, we call generalized HyperLTL with stuttering and contexts (GHyperLTL_SC for short). GHyperLTL_SC unifies known asynchronous extensions of HyperLTL and the well-known extension KLTL of LTL with knowledge modalities under both the synchronous and asynchronous perfect recall semantics. As a main contribution, we individuate a meaningful fragment of GHyperLTL_SC, we call simple GHyperLTL_SC, with a decidable model-checking problem, which is more expressive than HyperLTL and known fragments of asynchronous extensions of HyperLTL with a decidable model-checking problem. Simple GHyperLTL_SC subsumes KLTL under the synchronous semantics and the one-agent fragment of KLTL under the asynchronous semantics, and to the best of our knowledge, it represents the unique hyper logic with a decidable model-checking problem which can express powerful non-regular trace properties when interpreted on singleton sets of traces. We justify the relevance of simple GHyperLTL_SC by showing that it can express diagnosability properties, interesting classes of information-flow security policies, both in the synchronous and asynchronous settings, and bounded termination (more in general, global promptness in the style of Prompt LTL).

[363]  arXiv:2404.16779 [pdf, other]
Title: DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks
Comments: ICLR 2024. Explore videos, data, code, and more at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given. The learned rewards can be \textit{reused} in unseen tasks, thus reducing the human effort for reward engineering. Extensive experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks, resulting in improved performance and sample efficiency of RL algorithms. The learned rewards even achieve comparable performance to human-engineered rewards on some tasks. See our project page (https://sites.google.com/view/iclr24drs) for more details.

[364]  arXiv:2404.16781 [pdf, other]
Title: Registration by Regression (RbR): a framework for interpretable and flexible atlas registration
Comments: 11 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In human neuroimaging studies, atlas registration enables mapping MRI scans to a common coordinate frame, which is necessary to aggregate data from multiple subjects. Machine learning registration methods have achieved excellent speed and accuracy but lack interpretability. More recently, keypoint-based methods have been proposed to tackle this issue, but their accuracy is still subpar, particularly when fitting nonlinear transforms. Here we propose Registration by Regression (RbR), a novel atlas registration framework that is highly robust and flexible, conceptually simple, and can be trained with cheaply obtained data. RbR predicts the (x,y,z) atlas coordinates for every voxel of the input scan (i.e., every voxel is a keypoint), and then uses closed-form expressions to quickly fit transforms using a wide array of possible deformation models, including affine and nonlinear (e.g., Bspline, Demons, invertible diffeomorphic models, etc.). Robustness is provided by the large number of voxels informing the registration and can be further increased by robust estimators like RANSAC. Experiments on independent public datasets show that RbR yields more accurate registration than competing keypoint approaches, while providing full control of the deformation model.

[365]  arXiv:2404.16787 [pdf, other]
Title: Enhancing Quality of Experience in Telecommunication Networks: A Review of Frameworks and Machine Learning Algorithms
Comments: 13 pages and 16 figures
Subjects: Networking and Internet Architecture (cs.NI)

The Internet service provider industry is currently experiencing intense competition as companies strive to provide top-notch services to their customers. Providers are introducing cutting-edge technologies to enhance service quality, understanding that their survival depends on the level of service they offer. However, evaluating service quality is a complex task. A crucial aspect of this evaluation lies in understanding user experience, which significantly impacts the success and reputation of a service or product. Ensuring a seamless and positive user experience is essential for attracting and retaining customers. To date, much effort has been devoted to developing tools for measuring Quality of Experience (QoE), which incorporate both subjective and objective criteria. These tools, available in closed and open-source formats, are accessible to organizations and contribute to improving user experience quality. This review article delves into recent research and initiatives aimed at creating frameworks for assessing user QoE. It also explores the integration of machine learning algorithms to enhance these tools for future advancements. Additionally, the article examines current challenges and envisions future directions in the development of these measurement tools.

[366]  arXiv:2404.16789 [pdf, other]
Title: Continual Learning of Large Language Models: A Comprehensive Survey
Comments: 57 pages, 2 figures, 4 tables. Work in progress
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as "catastrophic forgetting". While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at https://github.com/Wang-ML-Lab/llm-continual-learning-survey.

[367]  arXiv:2404.16790 [pdf, other]
Title: SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Comprehending text-rich visual content is paramount for the practical application of Multimodal Large Language Models (MLLMs), since text-rich scenarios are ubiquitous in the real world, which are characterized by the presence of extensive texts embedded within images. Recently, the advent of MLLMs with impressive versatility has raised the bar for what we can expect from MLLMs. However, their proficiency in text-rich scenarios has yet to be comprehensively and objectively assessed, since current MLLM benchmarks primarily focus on evaluating general visual comprehension. In this work, we introduce SEED-Bench-2-Plus, a benchmark specifically designed for evaluating \textbf{text-rich visual comprehension} of MLLMs. Our benchmark comprises 2.3K multiple-choice questions with precise human annotations, spanning three broad categories: Charts, Maps, and Webs, each of which covers a wide spectrum of text-rich scenarios in the real world. These categories, due to their inherent complexity and diversity, effectively simulate real-world text-rich environments. We further conduct a thorough evaluation involving 34 prominent MLLMs (including GPT-4V, Gemini-Pro-Vision and Claude-3-Opus) and emphasize the current limitations of MLLMs in text-rich visual comprehension. We hope that our work can serve as a valuable addition to existing MLLM benchmarks, providing insightful observations and inspiring further research in the area of text-rich visual comprehension with MLLMs. The dataset and evaluation code can be accessed at https://github.com/AILab-CVC/SEED-Bench.

[368]  arXiv:2404.16792 [pdf, other]
Title: Weak-to-Strong Extrapolation Expedites Alignment
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Although the capabilities of large language models (LLMs) ideally scale up with increasing data and compute, they are inevitably constrained by limited resources in reality. Suppose we have a moderately trained LLM (e.g., trained to align with human preference) in hand, can we further exploit its potential and cheaply acquire a stronger model? In this paper, we propose a simple method called ExPO to boost LLMs' alignment with human preference. ExPO assumes that a medium-aligned model can be interpolated between a less-aligned (weaker) model, e.g., the initial SFT model, and a better-aligned (stronger) one, thereby directly obtaining this stronger model by extrapolating from the weights of the former two relatively weaker models. On the AlpacaEval 2.0 benchmark, we show that ExPO pushes models trained with less preference data (e.g., 10% or 20%) to reach and even surpass the fully-trained one, without any additional training. Furthermore, ExPO also significantly improves off-the-shelf DPO/RLHF models and exhibits decent scalability across model sizes from 7B to 70B. Our work demonstrates the efficacy of model extrapolation in exploiting LLMs' capabilities, suggesting a promising direction that deserves future exploration.

[369]  arXiv:2404.16793 [pdf, ps, other]
Title: A Communication- and Memory-Aware Model for Load Balancing Tasks
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

While load balancing in distributed-memory computing has been well-studied, we present an innovative approach to this problem: a unified, reduced-order model that combines three key components to describe "work" in a distributed system: computation, communication, and memory. Our model enables an optimizer to explore complex tradeoffs in task placement, such as increased parallelism at the expense of data replication, which increases memory usage. We propose a fully distributed, heuristic-based load balancing optimization algorithm, and demonstrate that it quickly finds close-to-optimal solutions. We formalize the complex optimization problem as a mixed-integer linear program, and compare it to our strategy. Finally, we show that when applied to an electromagnetics code, our approach obtains up to 2.3x speedups for the imbalanced execution.

[370]  arXiv:2404.16794 [pdf, other]
Title: Structure-Preserving Oscillation-Eliminating Discontinuous Galerkin Schemes for Ideal MHD Equations: Locally Divergence-Free and Positivity-Preserving
Comments: 60 pages
Subjects: Numerical Analysis (math.NA); Instrumentation and Methods for Astrophysics (astro-ph.IM); Computational Physics (physics.comp-ph)

This paper develops structure-preserving, oscillation-eliminating discontinuous Galerkin (OEDG) schemes for ideal magnetohydrodynamics (MHD), as a sequel to our recent work [Peng, Sun, and Wu, OEDG: Oscillation-eliminating discontinuous Galerkin method for hyperbolic conservation laws, 2023]. The schemes are based on a locally divergence-free (LDF) oscillation-eliminating (OE) procedure to suppress spurious oscillations while maintaining many of the good properties of original DG schemes, such as conservation, local compactness, and optimal convergence rates. The OE procedure is built on the solution operator of a novel damping equation -- a simple linear ordinary differential equation (ODE) whose exact solution can be exactly formulated. Because this OE procedure does not interfere with DG spatial discretization and RK stage update, it can be easily incorporated to existing DG codes as an independent module. These features make the proposed LDF OEDG schemes highly efficient and easy to implement.In addition, we present a positivity-preserving (PP) analysis of the LDF OEDG schemes on Cartesian meshes via the optimal convex decomposition technique and the geometric quasi-linearization (GQL) approach. Efficient PP LDF OEDG schemes are obtained with the HLL flux under a condition accessible by the simple local scaling PP limiter.Several one- and two-dimensional MHD tests confirm the accuracy, effectiveness, and robustness of the proposed structure-preserving OEDG schemes.

[371]  arXiv:2404.16795 [pdf, other]
Title: In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization
Subjects: Machine Learning (cs.LG)

With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this approach pose challenges for existing methods, requiring retraining or fine-tuning their neural network surrogates online, introducing overhead, instability, and hyper-hyperparameters. In this work, we propose FT-PFN, a novel surrogate for Freeze-thaw style BO. FT-PFN is a prior-data fitted network (PFN) that leverages the transformers' in-context learning ability to efficiently and reliably do Bayesian learning curve extrapolation in a single forward pass. Our empirical analysis across three benchmark suites shows that the predictions made by FT-PFN are more accurate and 10-100 times faster than those of the deep Gaussian process and deep ensemble surrogates used in previous work. Furthermore, we show that, when combined with our novel acquisition mechanism (MFPI-random), the resulting in-context freeze-thaw BO method (ifBO), yields new state-of-the-art performance in the same three families of deep learning HPO benchmarks considered in prior work.

[372]  arXiv:2404.16798 [pdf, other]
Title: A Test Problem for Flow Codes
Subjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Fluid Dynamics (physics.flu-dyn)

We propose a test problem for Navier-Stokes solvers based on the flow around a cylinder. We choose a range of Reynolds numbers for which the flow is time-dependent but can be characterized as essentially two-dimensional. The test problem requires accurate resolution of chaotic dynamics over a long time interval. It also requires the use of a relatively large computational domain, part of which is curved, and it requires evaluation of derivatives of the solution and pressure on the curved boundary. We review the performance of different finite element methods for the proposed range of Reynolds numbers. These tests indicate that some of the most established methods do not capture the correct behavior.

[373]  arXiv:2404.16804 [pdf, other]
Title: AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
Comments: Accepted to CVPR 2024 Workshop on Prompting in Vision, Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improvement for unseen classes is still marginal, and to tackle this problem, data augmentation has been frequently used in traditional zero-shot learning techniques. Through our experiments, we have identified important issues in CoOp and CoCoOp: the context learned through traditional image augmentation is biased toward seen classes, negatively impacting generalization to unseen classes. To address this problem, we propose adversarial token embedding to disentangle low-level visual augmentation features from high-level class information when inducing bias in learnable prompts. Through our novel mechanism called "Adding Attributes to Prompt Learning", AAPL, we guide the learnable context to effectively extract text features by focusing on high-level features for unseen classes. We have conducted experiments across 11 datasets, and overall, AAPL shows favorable performances compared to the existing methods in few-shot learning, zero-shot learning, cross-dataset, and domain generalization tasks.

[374]  arXiv:2404.16807 [pdf, other]
Title: Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning
Comments: 16 pages, 6 figures
Subjects: Computation and Language (cs.CL)

Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge, while generating coherent sentences. Although the quality of the generated sentences is crucial, the diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts. Large Language Models (LLMs) have shown proficiency in enhancing the generation quality across various tasks through in-context learning (ICL) using given examples without the need for any fine-tuning. However, the diversity aspect in LLM outputs has not been systematically studied before. To address this, we propose a simple method that diversifies the LLM generations, while preserving their quality. Experimental results on three benchmark GCR datasets show that our method achieves an ideal balance between the quality and diversity. Moreover, the sentences generated by our proposed method can be used as training data to improve diversity in existing commonsense generators.

[375]  arXiv:2404.16811 [pdf, other]
Title: Make Your LLM Fully Utilize the Context
Comments: 19 pages, 7 figures, 3 tables, 9 examples
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.

[376]  arXiv:2404.16812 [pdf, other]
Title: ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs
Comments: To appear in the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC'24)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.

[377]  arXiv:2404.16814 [pdf, other]
Title: Meta-Transfer Derm-Diagnosis: Exploring Few-Shot Learning and Transfer Learning for Skin Disease Classification in Long-Tail Distribution
Comments: 17 pages, 5 figures, 6 tables, submitted to IEEE Journal of Biomedical and Health Informatics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Addressing the challenges of rare diseases is difficult, especially with the limited number of reference images and a small patient population. This is more evident in rare skin diseases, where we encounter long-tailed data distributions that make it difficult to develop unbiased and broadly effective models. The diverse ways in which image datasets are gathered and their distinct purposes also add to these challenges. Our study conducts a detailed examination of the benefits and drawbacks of episodic and conventional training methodologies, adopting a few-shot learning approach alongside transfer learning. We evaluated our models using the ISIC2018, Derm7pt, and SD-198 datasets. With minimal labeled examples, our models showed substantial information gains and better performance compared to previously trained models. Our research emphasizes the improved ability to represent features in DenseNet121 and MobileNetV2 models, achieved by using pre-trained models on ImageNet to increase similarities within classes. Moreover, our experiments, ranging from 2-way to 5-way classifications with up to 10 examples, showed a growing success rate for traditional transfer learning methods as the number of examples increased. The addition of data augmentation techniques significantly improved our transfer learning based model performance, leading to higher performances than existing methods, especially in the SD-198 and ISIC2018 datasets. All source code related to this work will be made publicly available soon at the provided URL.

[378]  arXiv:2404.16816 [pdf, other]
Title: IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages
Subjects: Computation and Language (cs.CL)

As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse set 29 of Indic languages covering 13 scripts and 4 language families. IndicGenBench is composed of diverse generation tasks like cross-lingual summarization, machine translation, and cross-lingual question answering. IndicGenBench extends existing benchmarks to many Indic languages through human curation providing multi-way parallel evaluation data for many under-represented Indic languages for the first time. We evaluate a wide range of proprietary and open-source LLMs including GPT-3.5, GPT-4, PaLM-2, mT5, Gemma, BLOOM and LLaMA on IndicGenBench in a variety of settings. The largest PaLM-2 models performs the best on most tasks, however, there is a significant performance gap in all languages compared to English showing that further research is needed for the development of more inclusive multilingual language models. IndicGenBench is released at www.github.com/google-research-datasets/indic-gen-bench

[379]  arXiv:2404.16818 [pdf, other]
Title: Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals
Comments: Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global categories within an image corpus without any form of annotation. Building upon recent advances in self-supervised representation learning, we focus on how to leverage these large pre-trained models for the downstream task of unsupervised segmentation. We present PriMaPs - Principal Mask Proposals - decomposing images into semantically meaningful masks based on their feature representation. This allows us to realize unsupervised semantic segmentation by fitting class prototypes to PriMaPs with a stochastic expectation-maximization algorithm, PriMaPs-EM. Despite its conceptual simplicity, PriMaPs-EM leads to competitive results across various pre-trained backbone models, including DINO and DINOv2, and across datasets, such as Cityscapes, COCO-Stuff, and Potsdam-3. Importantly, PriMaPs-EM is able to boost results when applied orthogonally to current state-of-the-art unsupervised semantic segmentation pipelines.

[380]  arXiv:2404.16820 [pdf, other]
Title: Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Comments: Data and code will be released at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

While text-to-image (T2I) generative models have become ubiquitous, they do not necessarily generate images that align with a given prompt. While previous work has evaluated T2I alignment by proposing metrics, benchmarks, and templates for collecting human judgements, the quality of these components is not systematically measured. Human-rated prompt sets are generally small and the reliability of the ratings -- and thereby the prompt set used to compare models -- is not evaluated. We address this gap by performing an extensive study evaluating auto-eval metrics and human templates. We provide three main contributions: (1) We introduce a comprehensive skills-based benchmark that can discriminate models across different human templates. This skills-based benchmark categorises prompts into sub-skills, allowing a practitioner to pinpoint not only which skills are challenging, but at what level of complexity a skill becomes challenging. (2) We gather human ratings across four templates and four T2I models for a total of >100K annotations. This allows us to understand where differences arise due to inherent ambiguity in the prompt and where they arise due to differences in metric and model quality. (3) Finally, we introduce a new QA-based auto-eval metric that is better correlated with human ratings than existing metrics for our new dataset, across different human templates, and on TIFA160.

[381]  arXiv:2404.16821 [pdf, other]
Title: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs. (2) Dynamic High-Resolution: we divide images into tiles ranging from 1 to 40 of 448$\times$448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input. (3) High-Quality Bilingual Dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images, and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in OCR- and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks. Code has been released at https://github.com/OpenGVLab/InternVL.

[382]  arXiv:2404.16823 [pdf, other]
Title: Learning Visuotactile Skills with Two Multifingered Hands
Comments: Code and Project Website: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. To tackle the first challenge, we develop HATO, a low-cost hands-arms teleoperation system that leverages off-the-shelf electronics, complemented with a software suite that enables efficient data collection; the comprehensive software suite also supports multimodal data processing, scalable policy learning, and smooth policy deployment. To tackle the latter challenge, we introduce a novel hardware adaptation by repurposing two prosthetic hands equipped with touch sensors for research. Using visuotactile data collected from our system, we learn skills to complete long-horizon, high-precision tasks which are difficult to achieve without multifingered dexterity and touch feedback. Furthermore, we empirically investigate the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data. Videos, code, and datasets can be found at https://toruowo.github.io/hato/ .

[383]  arXiv:2404.16824 [pdf, other]
Title: V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection
Subjects: Computer Vision and Pattern Recognition (cs.CV)

AI-generated video has revolutionized short video production, filmmaking, and personalized media, making video local editing an essential tool. However, this progress also blurs the line between reality and fiction, posing challenges in multimedia forensics. To solve this urgent issue, V2A-Mark is proposed to address the limitations of current video tampering forensics, such as poor generalizability, singular function, and single modality focus. Combining the fragility of video-into-video steganography with deep robust watermarking, our method can embed invisible visual-audio localization watermarks and copyright watermarks into the original video frames and audio, enabling precise manipulation localization and copyright protection. We also design a temporal alignment and fusion module and degradation prompt learning to enhance the localization accuracy and decoding robustness. Meanwhile, we introduce a sample-level audio localization method and a cross-modal copyright extraction mechanism to couple the information of audio and video frames. The effectiveness of V2A-Mark has been verified on a visual-audio tampering dataset, emphasizing its superiority in localization precision and copyright accuracy, crucial for the sustainable development of video editing in the AIGC video era.

[384]  arXiv:2404.16825 [pdf, other]
Title: ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content viewed on head mounted displays (HMDs) is actually a rendered viewport instead of an ERP image. In this work, we emphasize that focusing solely on ERP quality results in inferior viewport visual experiences for users. Thus, we propose ResVR, which is the first comprehensive framework for the joint Rescaling and Viewport Rendering of ODIs. ResVR allows obtaining LR ERP images for transmission while rendering high-quality viewports for users to watch on HMDs. In our ResVR, a novel discrete pixel sampling strategy is developed to tackle the complex mapping between the viewport and ERP, enabling end-to-end training of ResVR pipeline. Furthermore, a spherical pixel shape representation technique is innovatively derived from spherical differentiation to significantly improve the visual quality of rendered viewports. Extensive experiments demonstrate that our ResVR outperforms existing methods in viewport rendering tasks across different fields of view, resolutions, and view directions while keeping a low transmission overhead.

[385]  arXiv:2404.16828 [pdf, other]
Title: Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Our objective is to discover and localize monotonic temporal changes in a sequence of images. To achieve this, we exploit a simple proxy task of ordering a shuffled image sequence, with `time' serving as a supervisory signal since only changes that are monotonic with time can give rise to the correct ordering. We also introduce a flexible transformer-based model for general-purpose ordering of image sequences of arbitrary length with built-in attribution maps. After training, the model successfully discovers and localizes monotonic changes while ignoring cyclic and stochastic ones. We demonstrate applications of the model in multiple video settings covering different scene and object types, discovering both object-level and environmental changes in unseen sequences. We also demonstrate that the attention-based attribution maps function as effective prompts for segmenting the changing regions, and that the learned representations can be used for downstream applications. Finally, we show that the model achieves the state of the art on standard benchmarks for ordering a set of images.

[386]  arXiv:2404.16829 [pdf, other]
Title: Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions. However, existing 3D assets and generative models often lack authentic material properties. Manual assignment of materials using graphic software is a tedious and time-consuming task. In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs), particularly GPT-4V, to present a novel approach, Make-it-Real: 1) We demonstrate that GPT-4V can effectively recognize and describe materials, allowing the construction of a detailed material library. 2) Utilizing a combination of visual cues and hierarchical text prompts, GPT-4V precisely identifies and aligns materials with the corresponding components of 3D objects. 3) The correctly matched materials are then meticulously applied as reference for the new SVBRDF material generation according to the original diffuse map, significantly enhancing their visual authenticity. Make-it-Real offers a streamlined integration into the 3D content creation workflow, showcasing its utility as an essential tool for developers of 3D assets.

[387]  arXiv:2404.16831 [pdf, other]
Title: The Third Monocular Depth Estimation Challenge
Comments: To appear in CVPRW2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 submissions outperforming the baseline on the test set: 10 among them submitted a report describing their approach, highlighting a diffused use of foundational models such as Depth Anything at the core of their method. The challenge winners drastically improved 3D F-Score performance, from 17.51% to 23.72%.

Cross-lists for Fri, 26 Apr 24

[388]  arXiv:2306.03664 (cross-list from eess.AS) [pdf, other]
Title: Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification
Comments: accepted at INTERSPEECH 2023, 20th-24th August 2023, Dublin, Ireland
Journal-ref: Proc. INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)

Most state-of-the-art self-supervised speaker verification systems rely on a contrastive-based objective function to learn speaker representations from unlabeled speech data. We explore different ways to improve the performance of these methods by: (1) revisiting how positive and negative pairs are sampled through a "symmetric" formulation of the contrastive loss; (2) introducing margins similar to AM-Softmax and AAM-Softmax that have been widely adopted in the supervised setting. We demonstrate the effectiveness of the symmetric contrastive loss which provides more supervision for the self-supervised task. Moreover, we show that Additive Margin and Additive Angular Margin allow reducing the overall number of false negatives and false positives by improving speaker separability. Finally, by combining both techniques and training a larger model we achieve 7.50% EER and 0.5804 minDCF on the VoxCeleb1 test set, which outperforms other contrastive self supervised methods on speaker verification.

[389]  arXiv:2311.02833 (cross-list from eess.SY) [pdf, other]
Title: CESAR: Control Envelope Synthesis via Angelic Refinements
Subjects: Systems and Control (eess.SY); Logic in Computer Science (cs.LO)

This paper presents an approach for synthesizing provably correct control envelopes for hybrid systems. Control envelopes characterize families of safe controllers and are used to monitor untrusted controllers at runtime. Our algorithm fills in the blanks of a hybrid system's sketch specifying the desired shape of the control envelope, the possible control actions, and the system's differential equations. In order to maximize the flexibility of the control envelope, the synthesized conditions saying which control action can be chosen when should be as permissive as possible while establishing a desired safety condition from the available assumptions, which are augmented if needed. An implicit, optimal solution to this synthesis problem is characterized using hybrid systems game theory, from which explicit solutions can be derived via symbolic execution and sound, systematic game refinements. Optimality can be recovered in the face of approximation via a dual game characterization. The resulting algorithm, Control Envelope Synthesis via Angelic Refinements (CESAR), is demonstrated in a range of safe control synthesis examples with different control challenges.

[390]  arXiv:2312.07896 (cross-list from eess.SY) [pdf, other]
Title: TRACTOR: Traffic Analysis and Classification Tool for Open RAN
Comments: 6 pages, 5 figures, 2 tables, submitted to ICC 2024
Subjects: Systems and Control (eess.SY); Networking and Internet Architecture (cs.NI)

5G and beyond cellular networks promise remarkable advancements in bandwidth, latency, and connectivity. The emergence of Open Radio Access Network (O-RAN) represents a pivotal direction for the evolution of cellular networks, inherently supporting machine learning (ML) for network operation control. Within this framework, RAN Intelligence Controllers (RICs) from one provider can employ ML models developed by third-party vendors through the acquisition of key performance indicators (KPIs) from geographically distant base stations or user equipment (UE). Yet, the development of ML models hinges on the availability of realistic and robust datasets. In this study, we embark on a two-fold journey. First, we collect a comprehensive 5G dataset, harnessing real-world cell phones across diverse applications, locations, and mobility scenarios. Next, we replicate this traffic within a full-stack srsRAN-based O-RAN framework on Colosseum, the world's largest radio frequency (RF) emulator. This process yields a robust and O-RAN compliant KPI dataset mirroring real-world conditions. We illustrate how such a dataset can fuel the training of ML models and facilitate the deployment of xApps for traffic slice classification by introducing a CNN based classifier that achieves accuracy $>95\%$ offline and $92\%$ online. To accelerate research in this domain, we provide open-source access to our toolchain and supplementary utilities, empowering the broader research community to expedite the creation of realistic and O-RAN compliant datasets.

[391]  arXiv:2404.13101 (cross-list from eess.IV) [pdf, ps, other]
Title: DensePANet: An improved generative adversarial network for photoacoustic tomography image reconstruction from sparse data
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)

Image reconstruction is an essential step of every medical imaging method, including Photoacoustic Tomography (PAT), which is a promising modality of imaging, that unites the benefits of both ultrasound and optical imaging methods. Reconstruction of PAT images using conventional methods results in rough artifacts, especially when applied directly to sparse PAT data. In recent years, generative adversarial networks (GANs) have shown a powerful performance in image generation as well as translation, rendering them a smart choice to be applied to reconstruction tasks. In this study, we proposed an end-to-end method called DensePANet to solve the problem of PAT image reconstruction from sparse data. The proposed model employs a novel modification of UNet in its generator, called FD-UNet++, which considerably improves the reconstruction performance. We evaluated the method on various in-vivo and simulated datasets. Quantitative and qualitative results show the better performance of our model over other prevalent deep learning techniques.

[392]  arXiv:2404.15076 (cross-list from eess.SY) [pdf, other]
Title: Securing O-RAN Open Interfaces
Subjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)

The next generation of cellular networks will be characterized by openness, intelligence, virtualization, and distributed computing. The Open Radio Access Network (Open RAN) framework represents a significant leap toward realizing these ideals, with prototype deployments taking place in both academic and industrial domains. While it holds the potential to disrupt the established vendor lock-ins, Open RAN's disaggregated nature raises critical security concerns. Safeguarding data and securing interfaces must be integral to Open RAN's design, demanding meticulous analysis of cost/benefit tradeoffs.
In this paper, we embark on the first comprehensive investigation into the impact of encryption on two pivotal Open RAN interfaces: the E2 interface, connecting the base station with a near-real-time RAN Intelligent Controller, and the Open Fronthaul, connecting the Radio Unit to the Distributed Unit. Our study leverages a full-stack O-RAN ALLIANCE compliant implementation within the Colosseum network emulator and a production-ready Open RAN and 5G-compliant private cellular network. This research contributes quantitative insights into the latency introduced and throughput reduction stemming from using various encryption protocols. Furthermore, we present four fundamental principles for constructing security by design within Open RAN systems, offering a roadmap for navigating the intricate landscape of Open RAN security.

[393]  arXiv:2404.15405 (cross-list from astro-ph.SR) [pdf, ps, other]
Title: Photometry of Saturated Stars with Machine Learning
Authors: Dominek Winecki (1) Christopher S. Kochanek (2) ((1) Dept. of Computer Science and Engineeering, The Ohio State University (2) Dept. of Astronomy, The Ohio State University)
Comments: submitted to ApJ
Subjects: Solar and Stellar Astrophysics (astro-ph.SR); Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV)

We develop a deep neural network (DNN) to obtain photometry of saturated stars in the All-Sky Automated Survey for Supernovae (ASAS-SN). The DNN can obtain unbiased photometry for stars from g=4 to 14 mag with a dispersion (15%-85% 1sigma range around median) of 0.12 mag for saturated (g<11.5 mag) stars. More importantly, the light curve of a non-variable saturated star has a median dispersion of only 0.037 mag. The DNN light curves are, in many cases, spectacularly better than provided by the standard ASAS-SN pipelines. While the network was trained on g band data from only one of ASAS-SN's 20 cameras, initial experiments suggest that it can be used for any camera and the older ASAS-SN V band data as well. The dominant problems seem to be associated with correctable issues in the ASAS-SN data reduction pipeline for saturated stars more than the DNN itself. The method is publicly available as a light curve option on ASAS-SN Sky Patrol v1.0.

[394]  arXiv:2404.16040 (cross-list from q-bio.NC) [pdf, ps, other]
Title: Pilot Study to Discover Candidate Biomarkers for Autism based on Perception and Production of Facial Expressions
Comments: 18 pages, 3 figures, 5 tables
Subjects: Neurons and Cognition (q-bio.NC); Human-Computer Interaction (cs.HC)

Purpose: Facial expression production and perception in autism spectrum disorder (ASD) suggest potential presence of behavioral biomarkers that may stratify individuals on the spectrum into prognostic or treatment subgroups. Construct validity and group discriminability have been recommended as criteria for identification of candidate stratification biomarkers.
Methods: In an online pilot study of 11 children and young adults diagnosed with ASD and 11 age- and gender-matched neurotypical (NT) individuals, participants recognize and mimic static and dynamic facial expressions of 3D avatars. Webcam-based eye-tracking (ET) and facial video tracking (VT), including activation and asymmetry of action units (AUs) from the Facial Action Coding System (FACS) are collected. We assess validity of constructs for each dependent variable (DV) based on the expected response in the NT group. Then, the Boruta statistical method identifies DVs that are significant to group discriminability (ASD or NT).
Results: We identify one candidate ET biomarker (percentage gaze duration to the face while mimicking static 'disgust' expression) and 14 additional DVs of interest for future study, including 4 ET DVs, 5 DVs related to VT AU activation, and 4 DVs related to AU asymmetry in VT. Based on a power analysis, we provide sample size recommendations for future studies.
Conclusion: This pilot study provides a framework for ASD stratification biomarker discovery based on perception and production of facial expressions.

[395]  arXiv:2404.16049 (cross-list from physics.med-ph) [pdf, other]
Title: Exploring the limitations of blood pressure estimation using the photoplethysmography signal
Comments: 17 pages, 7 figures, 3 tables
Subjects: Medical Physics (physics.med-ph); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)

Hypertension, a leading contributor to cardiovascular morbidity, underscores the need for accurate and continuous blood pressure (BP) monitoring. Photoplethysmography (PPG) presents a promising approach to this end. However, the precision of BP estimates derived from PPG signals has been the subject of ongoing debate, necessitating a comprehensive evaluation of their effectiveness and constraints. We developed a calibration-based Siamese ResNet model for BP estimation, using a signal input paired with a reference BP reading. We compared the use of normalized PPG (N-PPG) against the normalized Invasive Arterial Blood Pressure (N-IABP) signals as input. The N-IABP signals do not directly present systolic and diastolic values but theoretically provide a more accurate BP measure than PPG signals since it is a direct pressure sensor inside the body. Our strategy establishes a critical benchmark for PPG performance, realistically calibrating expectations for PPG's BP estimation capabilities. Nonetheless, we compared the performance of our models using different signal-filtering conditions to evaluate the impact of filtering on the results. We evaluated our method using the AAMI and the BHS standards employing the VitalDB dataset. The N-IABP signals meet with AAMI standards for both Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP), with errors of 1.29+-6.33mmHg for systolic pressure and 1.17+-5.78mmHg for systolic and diastolic pressure respectively for the raw N-IABP signal. In contrast, N-PPG signals, in their best setup, exhibited inferior performance than N-IABP, presenting 1.49+-11.82mmHg and 0.89+-7.27mmHg for systolic and diastolic pressure respectively. Our findings highlight the potential and limitations of employing PPG for BP estimation, showing that these signals contain information correlated to BP but may not be sufficient for predicting it accurately.

[396]  arXiv:2404.16080 (cross-list from eess.IV) [pdf, other]
Title: Enhancing Diagnosis through AI-driven Analysis of Reflectance Confocal Microscopy
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Reflectance Confocal Microscopy (RCM) is a non-invasive imaging technique used in biomedical research and clinical dermatology. It provides virtual high-resolution images of the skin and superficial tissues, reducing the need for physical biopsies. RCM employs a laser light source to illuminate the tissue, capturing the reflected light to generate detailed images of microscopic structures at various depths. Recent studies explored AI and machine learning, particularly CNNs, for analyzing RCM images. Our study proposes a segmentation strategy based on textural features to identify clinically significant regions, empowering dermatologists in effective image interpretation and boosting diagnostic confidence. This approach promises to advance dermatological diagnosis and treatment.

[397]  arXiv:2404.16101 (cross-list from quant-ph) [pdf, other]
Title: Multivariate Fidelities
Comments: 79 pages, 1 figure
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph); Operator Algebras (math.OA)

The main contribution of our paper is to introduce a number of multivariate quantum fidelities and show that they satisfy several desirable properties that are natural extensions of those of the Uhlmann and Holevo fidelities. We propose three variants that reduce to the average pairwise fidelity for commuting states: average pairwise $z$-fidelities, the multivariate semi-definite programming (SDP) fidelity, and a multivariate fidelity inspired by an existing secrecy measure. The second one is obtained by extending the SDP formulation of the Uhlmann fidelity to more than two states. All three of these variants satisfy the following properties: (i) reduction to multivariate classical fidelities for commuting states, (ii) the data-processing inequality, (iii) invariance under permutations of the states, (iv) its values are in the interval $[0,1]$; they are faithful, that is, their values are equal to one if and only if all the states are equal, and they satisfy orthogonality, that is their values are equal to zero if and only if the states are mutually orthogonal to each other, (v) direct-sum property, (vi) joint concavity, and (vii) uniform continuity bounds under certain conditions. Furthermore, we establish inequalities relating these different variants, indeed clarifying that all these definitions coincide with the average pairwise fidelity for commuting states. Lastly, we introduce another multivariate fidelity called multivariate log-Euclidean fidelity, which is a quantum generalization of the Matusita multivariate fidelity. We also show that it satisfies most of the desirable properties listed above, it is a function of a multivariate log-Euclidean divergence, and has an operational interpretation in terms of quantum hypothesis testing with an arbitrarily varying null hypothesis.

[398]  arXiv:2404.16104 (cross-list from eess.AS) [pdf, other]
Title: Evolution of Voices in French Audiovisual Media Across Genders and Age in a Diachronic Perspective
Comments: 5 pages, 2 figures, keywords:, Gender, Diachrony, Vocal Tract Resonance, Vocal register, Broadcast speech
Journal-ref: Radek Skarnitzl & Jan Vol\'in (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), Prague 2023, pp. 753-757. Guarant International. ISBN 978-80-908 114-2-3
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

We present a diachronic acoustic analysis of the voice of 1023 speakers from French media archives. The speakers are spread across 32 categories based on four periods (years 1955/56, 1975/76, 1995/96, 2015/16), four age groups (20-35; 36-50; 51-65, >65), and two genders. The fundamental frequency ($F_0$) and the first four formants (F1-4) were estimated. Procedures used to ensure the quality of these estimations on heterogeneous data are described. From each speaker's $F_0$ distribution, the base-$F_0$ value was calculated to estimate the register. Average vocal tract length was estimated from formant frequencies. Base-$F_0$ and vocal tract length were fit by linear mixed models to evaluate how they may have changed across time periods and genders, corrected for age effects. Results show an effect of the period with a tendency to lower voices, independently of gender. A lowering of pitch is observed with age for female but not male speakers.

[399]  arXiv:2404.16141 (cross-list from physics.flu-dyn) [pdf, other]
Title: Machine-Learned Closure of URANS for Stably Stratified Turbulence: Connecting Physical Timescales & Data Hyperparameters of Deep Time-Series Models
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)

We develop time-series machine learning (ML) methods for closure modeling of the Unsteady Reynolds Averaged Navier Stokes (URANS) equations applied to stably stratified turbulence (SST). SST is strongly affected by fine balances between forces and becomes more anisotropic in time for decaying cases. Moreover, there is a limited understanding of the physical phenomena described by some of the terms in the URANS equations. Rather than attempting to model each term separately, it is attractive to explore the capability of machine learning to model groups of terms, i.e., to directly model the force balances. We consider decaying SST which are homogeneous and stably stratified by a uniform density gradient, enabling dimensionality reduction. We consider two time-series ML models: Long Short-Term Memory (LSTM) and Neural Ordinary Differential Equation (NODE). Both models perform accurately and are numerically stable in a posteriori tests. Furthermore, we explore the data requirements of the ML models by extracting physically relevant timescales of the complex system. We find that the ratio of the timescales of the minimum information required by the ML models to accurately capture the dynamics of the SST corresponds to the Reynolds number of the flow. The current framework provides the backbone to explore the capability of such models to capture the dynamics of higher-dimensional complex SST flows.

[400]  arXiv:2404.16156 (cross-list from quant-ph) [pdf, other]
Title: Guardians of the Quantum GAN
Comments: 11 pages, 10 figures
Subjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Quantum Generative Adversarial Networks (qGANs) are at the forefront of image-generating quantum machine learning models. To accommodate the growing demand for Noisy Intermediate-Scale Quantum (NISQ) devices to train and infer quantum machine learning models, the number of third-party vendors offering quantum hardware as a service is expected to rise. This expansion introduces the risk of untrusted vendors potentially stealing proprietary information from the quantum machine learning models. To address this concern we propose a novel watermarking technique that exploits the noise signature embedded during the training phase of qGANs as a non-invasive watermark. The watermark is identifiable in the images generated by the qGAN allowing us to trace the specific quantum hardware used during training hence providing strong proof of ownership. To further enhance the security robustness, we propose the training of qGANs on a sequence of multiple quantum hardware, embedding a complex watermark comprising the noise signatures of all the training hardware that is difficult for adversaries to replicate. We also develop a machine learning classifier to extract this watermark robustly, thereby identifying the training hardware (or the suite of hardware) from the images generated by the qGAN validating the authenticity of the model. We note that the watermark signature is robust against inferencing on hardware different than the hardware that was used for training. We obtain watermark extraction accuracy of 100% and ~90% for training the qGAN on individual and multiple quantum hardware setups (and inferencing on different hardware), respectively. Since parameter evolution during training is strongly modulated by quantum noise, the proposed watermark can be extended to other quantum machine learning models as well.

[401]  arXiv:2404.16196 (cross-list from q-bio.QM) [pdf, other]
Title: ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)

The global decline in bee populations poses significant risks to agriculture, biodiversity, and environmental stability. To bridge the gap in existing data, we introduce ApisTox, a comprehensive dataset focusing on the toxicity of pesticides to honey bees (Apis mellifera). This dataset combines and leverages data from existing sources such as ECOTOX and PPDB, providing an extensive, consistent, and curated collection that surpasses the previous datasets. ApisTox incorporates a wide array of data, including toxicity levels for chemicals, details such as time of their publication in literature, and identifiers linking them to external chemical databases. This dataset may serve as an important tool for environmental and agricultural research, but also can support the development of policies and practices aimed at minimizing harm to bee populations. Finally, ApisTox offers a unique resource for benchmarking molecular property prediction methods on agrochemical compounds, facilitating advancements in both environmental science and cheminformatics. This makes it a valuable tool for both academic research and practical applications in bee conservation.

[402]  arXiv:2404.16197 (cross-list from q-bio.MN) [pdf, other]
Title: On Hybrid Gene Regulatory Networks
Subjects: Molecular Networks (q-bio.MN); Computational Engineering, Finance, and Science (cs.CE)

In this work, we study a class of hybrid dynamical systems called hybrid gene regulatory networks (HGRNs) which was proposed to model gene regulatory networks. In HGRNs, there exist well-behaved trajectories that reach a fixed point or converge to a limit cycle, as well as chaotic trajectories that behave non-periodic or indeterministic. In our work, we investigate these irregular behaviors of HGRNs and present theoretical results about the decidability of the reachability problem, the probability of indeterministic behavior of HGRNs, and chaos especially in 2-dimensional HGRNs.

[403]  arXiv:2404.16204 (cross-list from quant-ph) [pdf, other]
Title: Entanglement-Based Artificial Topology: Neighboring Remote Network Nodes
Subjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI)

Entanglement is unanimously recognized as the key communication resource of the Quantum Internet. Yet, the possibility of implementing novel network functionalities by exploiting the marvels of entanglement has been poorly investigated so far, by mainly restricting the attention to bipartite entanglement. Conversely, in this paper, we aim at exploiting multipartite entanglement as inter-network resource. Specifically, we consider the interconnection of different Quantum Local Area Networks (QLANs), and we show that multipartite entanglement allows to dynamically generate an inter-QLAN artificial topology, by means of local operations only, that overcomes the limitations of the physical QLAN topologies. To this aim, we first design the multipartite entangled state to be distributed within each QLAN. Then, we show how such a state can be engineered to: i) interconnect nodes belonging to different QLANs, and ii) dynamically adapt to different inter-QLAN traffic patterns. Our contribution aims at providing the network engineering community with a hands-on guideline towards the concept of artificial topology and artificial neighborhood.

[404]  arXiv:2404.16209 (cross-list from stat.ME) [pdf, ps, other]
Title: Exploring Spatial Context: A Comprehensive Bibliography of GWR and MGWR
Comments: 372 pages
Subjects: Methodology (stat.ME); Mathematical Software (cs.MS); Applications (stat.AP); Computation (stat.CO)

Local spatial models such as Geographically Weighted Regression (GWR) and Multiscale Geographically Weighted Regression (MGWR) serve as instrumental tools to capture intrinsic contextual effects through the estimates of the local intercepts and behavioral contextual effects through estimates of the local slope parameters. GWR and MGWR provide simple implementation yet powerful frameworks that could be extended to various disciplines that handle spatial data. This bibliography aims to serve as a comprehensive compilation of peer-reviewed papers that have utilized GWR or MGWR as a primary analytical method to conduct spatial analyses and acts as a useful guide to anyone searching the literature for previous examples of local statistical modeling in a wide variety of application fields.

[405]  arXiv:2404.16269 (cross-list from math.OC) [pdf, other]
Title: Expected Time-Optimal Control: a Particle MPC-based Approach via Sequential Convex Programming
Comments: submitted to CDC 2024
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In this paper, we consider the problem of minimum-time optimal control for a dynamical system with initial state uncertainties and propose a sequential convex programming (SCP) solution framework. We seek to minimize the expected terminal (mission) time, which is an essential capability for planetary exploration missions where ground rovers have to carry out scientific tasks efficiently within the mission timelines in uncertain environments. Our main contribution is to convert the underlying stochastic optimal control problem into a deterministic, numerically tractable, optimal control problem. To this end, the proposed solution framework combines two strategies from previous methods: i) a partial model predictive control with consensus horizon approach and ii) a sum-of-norm cost, a temporally strictly increasing weighted-norm, promoting minimum-time trajectories. Our contribution is to adopt these formulations into an SCP solution framework and obtain a numerically tractable stochastic control algorithm. We then demonstrate the resulting control method in multiple applications: i) a closed-loop linear system as a representative result (a spacecraft double integrator model), ii) an open-loop linear system (the same model), and then iii) a nonlinear system (Dubin's car).

[406]  arXiv:2404.16287 (cross-list from stat.ML) [pdf, other]
Title: Differentially Private Federated Learning: Servers Trustworthiness, Estimation, and Statistical Inference
Comments: 56 pages, 3 figures
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)

Differentially private federated learning is crucial for maintaining privacy in distributed environments. This paper investigates the challenges of high-dimensional estimation and inference under the constraints of differential privacy. First, we study scenarios involving an untrusted central server, demonstrating the inherent difficulties of accurate estimation in high-dimensional problems. Our findings indicate that the tight minimax rates depends on the high-dimensionality of the data even with sparsity assumptions. Second, we consider a scenario with a trusted central server and introduce a novel federated estimation algorithm tailored for linear regression models. This algorithm effectively handles the slight variations among models distributed across different machines. We also propose methods for statistical inference, including coordinate-wise confidence intervals for individual parameters and strategies for simultaneous inference. Extensive simulation experiments support our theoretical advances, underscoring the efficacy and reliability of our approaches.

[407]  arXiv:2404.16328 (cross-list from stat.ML) [pdf, other]
Title: Distributionally Robust Safe Screening
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In this study, we propose a method Distributionally Robust Safe Screening (DRSS), for identifying unnecessary samples and features within a DR covariate shift setting. This method effectively combines DR learning, a paradigm aimed at enhancing model robustness against variations in data distribution, with safe screening (SS), a sparse optimization technique designed to identify irrelevant samples and features prior to model training. The core concept of the DRSS method involves reformulating the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the SS technique to accommodate this weight uncertainty, the DRSS method is capable of reliably identifying unnecessary samples and features under any future distribution within a specified range. We provide a theoretical guarantee of the DRSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.

[408]  arXiv:2404.16340 (cross-list from math.CO) [pdf, ps, other]
Title: Vertex Ranking of Degenerate Graphs
Comments: 15 pages, zero figures
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM)

An $\ell$-vertex-ranking of a graph $G$ is a colouring of the vertices of $G$ with integer colours so that in any connected subgraph $H$ of $G$ with diameter at most $\ell$, there is a vertex in $H$ whose colour is larger than that of every other vertex in $H$. The $\ell$-vertex-ranking number, $\chi_{\ell-\mathrm{vr}}(G)$, of $G$ is the minimum integer $k$ such that $G$ has an $\ell$-vertex-ranking using $k$ colours. We prove that, for any fixed $d$ and $\ell$, every $d$-degenerate $n$-vertex graph $G$ satisfies $\chi_{\ell-\mathrm{vr}}(G)= O(n^{1-2/(\ell+1)}\log n)$ if $\ell$ is even and $\chi_{\ell-\mathrm{vr}}(G)= O(n^{1-2/\ell}\log n)$ if $\ell$ is odd. The case $\ell=2$ resolves (up to the $\log n$ factor) an open problem posed by \citet{karpas.neiman.ea:on} and the cases $\ell\in\{2,3\}$ are asymptotically optimal (up to the $\log n$ factor).

[409]  arXiv:2404.16346 (cross-list from eess.IV) [pdf, other]
Title: Light-weight Retinal Layer Segmentation with Global Reasoning
Comments: IEEE Transactions on Instrumentation & Measurement
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications. Therefore, it is desired to design a light-weight network with high performance for retinal layer segmentation. In this paper, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part employs multi-scale feature extraction and a Transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multi-scale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared to the current state-of-the-art method TransUnet with 105.7M parameters on both our collected dataset and two other public datasets, with only 3.3M parameters.

[410]  arXiv:2404.16357 (cross-list from q-bio.NC) [pdf, other]
Title: Reverse engineering the brain input: Network control theory to identify cognitive task-related control nodes
Subjects: Neurons and Cognition (q-bio.NC); Systems and Control (eess.SY)

The human brain receives complex inputs when performing cognitive tasks, which range from external inputs via the senses to internal inputs from other brain regions. However, the explicit inputs to the brain during a cognitive task remain unclear. Here, we present an input identification framework for reverse engineering the control nodes and the corresponding inputs to the brain. The framework is verified with synthetic data generated by a predefined linear system, indicating it can robustly reconstruct data and recover the inputs. Then we apply the framework to the real motor-task fMRI data from 200 human subjects. Our results show that the model with sparse inputs can reconstruct neural dynamics in motor tasks ($EV=0.779$) and the identified 28 control nodes largely overlap with the motor system. Underpinned by network control theory, our framework offers a general tool for understanding brain inputs.

[411]  arXiv:2404.16397 (cross-list from eess.IV) [pdf, other]
Title: Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology
Comments: Paper accepted at the First Workshop on Imageomics (Imageomics-AAAI-24) - Discovering Biological Knowledge from Images using AI (this https URL), held as part of the 38th Annual AAAI Conference on Artificial Intelligence (this https URL)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

The interactions between tumor cells and the tumor microenvironment (TME) dictate therapeutic efficacy of radiation and many systemic therapies in breast cancer. However, to date, there is not a widely available method to reproducibly measure tumor and immune phenotypes for each patient's tumor. Given this unmet clinical need, we applied multiple instance learning (MIL) algorithms to assess activity of ten biologically relevant pathways from the hematoxylin and eosin (H&E) slide of primary breast tumors. We employed different feature extraction approaches and state-of-the-art model architectures. Using binary classification, our models attained area under the receiver operating characteristic (AUROC) scores above 0.70 for nearly all gene expression pathways and on some cases, exceeded 0.80. Attention maps suggest that our trained models recognize biologically relevant spatial patterns of cell sub-populations from H&E. These efforts represent a first step towards developing computational H&E biomarkers that reflect facets of the TME and hold promise for augmenting precision oncology.

[412]  arXiv:2404.16417 (cross-list from quant-ph) [pdf, other]
Title: Constructing Optimal Noise Channels for Enhanced Robustness in Quantum Machine Learning
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

With the rapid advancement of Quantum Machine Learning (QML), the critical need to enhance security measures against adversarial attacks and protect QML models becomes increasingly evident. In this work, we outline the connection between quantum noise channels and differential privacy (DP), by constructing a family of noise channels which are inherently $\epsilon$-DP: $(\alpha, \gamma)$-channels. Through this approach, we successfully replicate the $\epsilon$-DP bounds observed for depolarizing and random rotation channels, thereby affirming the broad generality of our framework. Additionally, we use a semi-definite program to construct an optimally robust channel. In a small-scale experimental evaluation, we demonstrate the benefits of using our optimal noise channel over depolarizing noise, particularly in enhancing adversarial accuracy. Moreover, we assess how the variables $\alpha$ and $\gamma$ affect the certifiable robustness and investigate how different encoding methods impact the classifier's robustness.

[413]  arXiv:2404.16424 (cross-list from physics.ins-det) [pdf, ps, other]
Title: Reciprocity in laser ultrasound revisited: Is wavefield characterisation by scanning laser excitation strictly reciprocal to that by scanning laser detection?
Comments: 20 pages, 15 figures
Subjects: Instrumentation and Detectors (physics.ins-det); Systems and Control (eess.SY); Optics (physics.optics)

The common believe about strict measurement reciprocity between scanning laser detection and scanning laser excitation is disproved by a simple experiment. Nevertheless, a deeper study based on the reciprocity relation reveals correct reciprocal measurement set-ups for both the probe-excitation / laser-detection and the laser-excitation / probe-detection case. Similarly, the all-laser measurement, that is thermoelastic laser excitation with laser vibrometer detection, is not in general reciprocal with respect to the exchange of excitation and detection positions. Again, a substitute for the laser doppler vibrometer out-of-plane displacement measurement was found which ensures measurement reciprocity together with laser excitation. The apparent confusion in literature about strict validity/non-validity of measurement reciprocity is mitigated by classifying the measurement situations systematically.

[414]  arXiv:2404.16450 (cross-list from math.NT) [pdf, ps, other]
Title: Unconditional correctness of recent quantum algorithms for factoring and computing discrete logarithms
Authors: Cédric Pilatte
Comments: 22 pages
Subjects: Number Theory (math.NT); Computational Complexity (cs.CC); Quantum Physics (quant-ph)

In 1994, Shor introduced his famous quantum algorithm to factor integers and compute discrete logarithms in polynomial time. In 2023, Regev proposed a multi-dimensional version of Shor's algorithm that requires far fewer quantum gates. His algorithm relies on a number-theoretic conjecture on the elements in $(\mathbb{Z}/N\mathbb{Z})^{\times}$ that can be written as short products of very small prime numbers. We prove a version of this conjecture using tools from analytic number theory such as zero-density estimates. As a result, we obtain an unconditional proof of correctness of this improved quantum algorithm and of subsequent variants.

[415]  arXiv:2404.16463 (cross-list from quant-ph) [pdf, ps, other]
Title: Quantum-assisted trustworthiness for the Quantum Internet
Comments: 7 pages, 5 figures, 1 table and 15 references
Subjects: Quantum Physics (quant-ph); Networking and Internet Architecture (cs.NI)

Device redundancy is one of the most well-known mechanisms in distributed systems to increase the overall system fault tolerance and, consequently, trustworthiness. Existing algorithms in this regard aim to exchange a significant number of messages among nodes to identify and agree which communication links or nodes are faulty. This approach greatly degrades the performance of those wireless communication networks exposed to limited available bandwidth and/or energy consumption due to messages flooding. Lately, quantum-assisted mechanisms have been envisaged as an appealing alternative to improve the performance in this kind of communication networks and have been shown to obtain levels of performance close to the ones achieved in ideal conditions. The purpose of this paper is to further explore this approach by using super-additivity and superposed quantum trajectories in quantum Internet to obtain a higher system trustworthiness. More specifically, the wireless communication network that supports the permafrost telemetry service for the Antarctica together with five operational modes (three of them using classical techniques and two of them using quantum-assisted mechanisms) have been simulated. Obtained results show that the new quantum-assisted mechanisms can increase the system performance by up to a 28%.

[416]  arXiv:2404.16482 (cross-list from q-bio.NC) [pdf, other]
Title: CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations
Subjects: Neurons and Cognition (q-bio.NC); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

A central question for cognitive science is to understand how humans process visual objects, i.e, to uncover human low-dimensional concept representation space from high-dimensional visual stimuli. Generating visual stimuli with controlling concepts is the key. However, there are currently no generative models in AI to solve this problem. Here, we present the Concept based Controllable Generation (CoCoG) framework. CoCoG consists of two components, a simple yet efficient AI agent for extracting interpretable concept and predicting human decision-making in visual similarity judgment tasks, and a conditional generation model for generating visual stimuli given the concepts. We quantify the performance of CoCoG from two aspects, the human behavior prediction accuracy and the controllable generation ability. The experiments with CoCoG indicate that 1) the reliable concept embeddings in CoCoG allows to predict human behavior with 64.07\% accuracy in the THINGS-similarity dataset; 2) CoCoG can generate diverse objects through the control of concepts; 3) CoCoG can manipulate human similarity judgment behavior by intervening key concepts. CoCoG offers visual objects with controlling concepts to advance our understanding of causality in human cognition. The code of CoCoG is available at \url{https://github.com/ncclab-sustech/CoCoG}.

[417]  arXiv:2404.16522 (cross-list from eess.IV) [pdf, other]
Title: A Deep Learning-Driven Pipeline for Differentiating Hypertrophic Cardiomyopathy from Cardiac Amyloidosis Using 2D Multi-View Echocardiography
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classifying 2D echocardiography data into five distinct echocardiographic views: apical 4-chamber, parasternal long axis of left ventricle, parasternal short axis at levels of the mitral valve, papillary muscle, and apex. It then extracts features of each view separately and combines five features for disease classification. A total of 212 patients diagnosed with HCM, and 30 patients diagnosed with CA, along with 200 individuals with normal cardiac function(Normal), were enrolled in this study from 2018 to 2022. This approach achieved a precision, recall of 0.905, and micro-F1 score of 0.904, demonstrating its effectiveness in accurately identifying HCM and CA using a multi-view analysis.

[418]  arXiv:2404.16547 (cross-list from eess.AS) [pdf, other]
Title: Developing Acoustic Models for Automatic Speech Recognition in Swedish
Authors: Giampiero Salvi
Comments: 16 pages, 7 figures
Journal-ref: European Student Journal of Language and Speech, 1999
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)

This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified task (digits and natural number recognition) has been considered for model evaluation. Different kinds of phone models have been tested, including context independent models and two variations of context dependent models. Furthermore many experiments have been done with bigram language models to tune some of the system parameters. System performance over various speaker subsets with different sex, age and dialect has also been examined. Results are compared to previous similar studies showing a remarkable improvement.

[419]  arXiv:2404.16560 (cross-list from stat.ML) [pdf, ps, other]
Title: Automated Model Selection for Generalized Linear Models
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

In this paper, we show how mixed-integer conic optimization can be used to combine feature subset selection with holistic generalized linear models to fully automate the model selection process. Concretely, we directly optimize for the Akaike and Bayesian information criteria while imposing constraints designed to deal with multicollinearity in the feature selection task. Specifically, we propose a novel pairwise correlation constraint that combines the sign coherence constraint with ideas from classical statistical models like Ridge regression and the OSCAR model.

[420]  arXiv:2404.16564 (cross-list from eess.IV) [pdf, other]
Title: Deep learning-based blind image super-resolution with iterative kernel reconstruction and noise estimation
Comments: 17 pages, 13 figures. The code of this paper is available in github: this https URL
Journal-ref: Computer Vision and Image Understanding, Volume 233, 2023, 103718
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)

Blind single image super-resolution (SISR) is a challenging task in image processing due to the ill-posed nature of the inverse problem. Complex degradations present in real life images make it difficult to solve this problem using na\"ive deep learning approaches, where models are often trained on synthetically generated image pairs. Most of the effort so far has been focused on solving the inverse problem under some constraints, such as for a limited space of blur kernels and/or assuming noise-free input images. Yet, there is a gap in the literature to provide a well-generalized deep learning-based solution that performs well on images with unknown and highly complex degradations. In this paper, we propose IKR-Net (Iterative Kernel Reconstruction Network) for blind SISR. In the proposed approach, kernel and noise estimation and high-resolution image reconstruction are carried out iteratively using dedicated deep models. The iterative refinement provides significant improvement in both the reconstructed image and the estimated blur kernel even for noisy inputs. IKR-Net provides a generalized solution that can handle any type of blur and level of noise in the input low-resolution image. IKR-Net achieves state-of-the-art results in blind SISR, especially for noisy images with motion blur.

[421]  arXiv:2404.16647 (cross-list from physics.optics) [pdf, ps, other]
Title: Application of RESNET50 Convolution Neural Network for the Extraction of Optical Parameters in Scattering Media
Subjects: Optics (physics.optics); Machine Learning (cs.LG)

Estimation of the optical properties of scattering media such as tissue is important in diagnostics as well as in the development of techniques to image deeper. As light penetrates the sample scattering events occur that alter the propagation direction of the photons in a random manner leading degradation of image quality. The distribution of the scattered light does, however, give a measure of the optical properties such as the reduced scattering coefficient and the absorption coefficient. Unfortunately, inverting scattering patterns to recover the optical properties is not simple, especially in the regime where the light is partially randomized. Machine learning has been proposed by several authors as a means of recovering these properties from either the back scattered or the transmitted light. In the present paper, we train a general purpose convolutional neural network RESNET 50 with simulated data based on Monte Carlo simulations. We show that compared with previous work our approach gives comparable or better reconstruction accuracy with training on a much smaller dataset. Moreover, by training on multiple parameters such as the intensity distribution at multiple planes or the exit angle and spatial distribution one achieves improved performance compared to training on a single input such as the intensity distribution captured at the sample surface. While our approach gives good parameter reconstruction, we identify factors that limit the accuracy of the recovered properties, particularly the absorption coefficient. In the light of these limitations, we suggest how the present approach may be enhanced for even better performance.

[422]  arXiv:2404.16664 (cross-list from q-bio.NC) [pdf, other]
Title: Lu.i -- A low-cost electronic neuron for education and outreach
Subjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)

With an increasing presence of science throughout all parts of society, there is a rising expectation for researchers to effectively communicate their work and, equally, for teachers to discuss contemporary findings in their classrooms. While the community can resort to an established set of teaching aids for the fundamental concepts of most natural sciences, there is a need for similarly illustrative experiments and demonstrators in neuroscience. We therefore introduce Lu.i: a parametrizable electronic implementation of the leaky-integrate-and-fire neuron model in an engaging form factor. These palm-sized neurons can be used to visualize and experience the dynamics of individual cells and small spiking neural networks. When stimulated with real or simulated sensory input, Lu.i demonstrates brain-inspired information processing in the hands of a student. As such, it is actively used at workshops, in classrooms, and for science communication. As a versatile tool for teaching and outreach, Lu.i nurtures the comprehension of neuroscience research and neuromorphic engineering among future generations of scientists and in the general public.

[423]  arXiv:2404.16696 (cross-list from q-bio.NC) [pdf, other]
Title: Report on Candidate Computational Indicators for Conscious Valenced Experience
Authors: Andres Campero
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI)

This report enlists 13 functional conditions cashed out in computational terms that have been argued to be constituent of conscious valenced experience. These are extracted from existing empirical and theoretical literature on, among others, animal sentience, medical disorders, anaesthetics, philosophy, evolution, neuroscience, and artificial intelligence.

[424]  arXiv:2404.16708 (cross-list from eess.IV) [pdf, other]
Title: Multi-view Cardiac Image Segmentation via Trans-Dimensional Priors
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

We propose a novel multi-stage trans-dimensional architecture for multi-view cardiac image segmentation. Our method exploits the relationship between long-axis (2D) and short-axis (3D) magnetic resonance (MR) images to perform a sequential 3D-to-2D-to-3D segmentation, segmenting the long-axis and short-axis images. In the first stage, 3D segmentation is performed using the short-axis image, and the prediction is transformed to the long-axis view and used as a segmentation prior in the next stage. In the second step, the heart region is localized and cropped around the segmentation prior using a Heart Localization and Cropping (HLC) module, focusing the subsequent model on the heart region of the image, where a 2D segmentation is performed. Similarly, we transform the long-axis prediction to the short-axis view, localize and crop the heart region and again perform a 3D segmentation to refine the initial short-axis segmentation. We evaluate our proposed method on the Multi-Disease, Multi-View & Multi-Center Right Ventricular Segmentation in Cardiac MRI (M&Ms-2) dataset, where our method outperforms state-of-the-art methods in segmenting cardiac regions of interest in both short-axis and long-axis images. The pre-trained models, source code, and implementation details will be publicly available.

[425]  arXiv:2404.16712 (cross-list from math.OC) [pdf, other]
Title: Distributed MPC for PWA Systems Based on Switching ADMM
Comments: 14 pages, 8 figures, submitted to IEEE Transactions on Automatic Control, code available at this https URL and this https URL
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

This paper presents a novel approach for distributed model predictive control (MPC) for piecewise affine (PWA) systems. Existing approaches rely on solving mixed-integer optimization problems, requiring significant computation power or time. We propose a distributed MPC scheme that requires solving only convex optimization problems. The key contribution is a novel method, based on the alternating direction method of multipliers, for solving the non-convex optimal control problem that arises due to the PWA dynamics. We present a distributed MPC scheme, leveraging this method, that explicitly accounts for the coupling between subsystems by reaching agreement on the values of coupled states. Stability and recursive feasibility are shown under additional assumptions on the underlying system. Two numerical examples are provided, in which the proposed controller is shown to significantly improve the CPU time and closed-loop performance over existing state-of-the-art approaches.

[426]  arXiv:2404.16718 (cross-list from eess.IV) [pdf, other]
Title: Features Fusion for Dual-View Mammography Mass Detection
Comments: Accepted at ISBI 2024 (21st IEEE International Symposium on Biomedical Imaging)
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Detection of malignant lesions on mammography images is extremely important for early breast cancer diagnosis. In clinical practice, images are acquired from two different angles, and radiologists can fully utilize information from both views, simultaneously locating the same lesion. However, for automatic detection approaches such information fusion remains a challenge. In this paper, we propose a new model called MAMM-Net, which allows the processing of both mammography views simultaneously by sharing information not only on an object level, as seen in existing works, but also on a feature level. MAMM-Net's key component is the Fusion Layer, based on deformable attention and designed to increase detection precision while keeping high recall. Our experiments show superior performance on the public DDSM dataset compared to the previous state-of-the-art model, while introducing new helpful features such as lesion annotation on pixel-level and classification of lesions malignancy.

[427]  arXiv:2404.16736 (cross-list from quant-ph) [pdf, other]
Title: Lifts of quantum CSS codes
Authors: Virgile Guemard
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)

We propose a notion of lift for quantum CSS codes, inspired by the geometrical construction of Freedman and Hastings. It is based on the existence of a canonical complex associated to any CSS code, that we introduce under the name of Tanner cone-complex, and over which we generate covering spaces. As a first application, we describe the classification of lifts of hypergraph product codes (HPC) and demonstrate the equivalence with the lifted product code (LPC) of Panteleev and Kalachev, including when the linear codes, factors of the HPC, are Tanner codes. As a second application, we report several new non-product constructions of quantum CSS codes, and we apply the prescription to generate their lifts which, for certain selected covering maps, are codes with improved relative parameters compared to the initial one.

[428]  arXiv:2404.16742 (cross-list from math.ST) [pdf, ps, other]
Title: Bayesian Nonparametric Inference in McKean-Vlasov models
Comments: 25 pages
Subjects: Statistics Theory (math.ST); Analysis of PDEs (math.AP); Numerical Analysis (math.NA)

We consider nonparametric statistical inference on a periodic interaction potential $W$ from noisy discrete space-time measurements of solutions $\rho=\rho_W$ of the nonlinear McKean-Vlasov equation, describing the probability density of the mean field limit of an interacting particle system. We show how Gaussian process priors assigned to $W$ give rise to posterior mean estimators that exhibit fast convergence rates for the implied estimated densities $\bar \rho$ towards $\rho_W$. We further show that if the initial condition $\phi$ is not too smooth and satisfies a standard deconvolvability condition, then one can consistently infer the potential $W$ itself at convergence rates $N^{-\theta}$ for appropriate $\theta>0$, where $N$ is the number of measurements. The exponent $\theta$ can be taken to approach $1/2$ as the regularity of $W$ increases corresponding to `near-parametric' models.

[429]  arXiv:2404.16751 (cross-list from quant-ph) [pdf, other]
Title: Efficient unitary designs and pseudorandom unitaries from permutations
Comments: 70 pages, 11 figures
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Mathematical Physics (math-ph)

In this work we give an efficient construction of unitary $k$-designs using $\tilde{O}(k\cdot poly(n))$ quantum gates, as well as an efficient construction of a parallel-secure pseudorandom unitary (PRU). Both results are obtained by giving an efficient quantum algorithm that lifts random permutations over $S(N)$ to random unitaries over $U(N)$ for $N=2^n$. In particular, we show that products of exponentiated sums of $S(N)$ permutations with random phases approximately match the first $2^{\Omega(n)}$ moments of the Haar measure. By substituting either $\tilde{O}(k)$-wise independent permutations, or quantum-secure pseudorandom permutations (PRPs) in place of the random permutations, we obtain the above results. The heart of our proof is a conceptual connection between the large dimension (large-$N$) expansion in random matrix theory and the polynomial method, which allows us to prove query lower bounds at finite-$N$ by interpolating from the much simpler large-$N$ limit. The key technical step is to exhibit an orthonormal basis for irreducible representations of the partition algebra that has a low-degree large-$N$ expansion. This allows us to show that the distinguishing probability is a low-degree rational polynomial of the dimension $N$.

[430]  arXiv:2404.16762 (cross-list from physics.chem-ph) [pdf, other]
Title: Analysis of Ethanol Blending Effects on Auto-Ignition and Heat Release in n-Heptane/Ethanol Non-Premixed Flames
Subjects: Chemical Physics (physics.chem-ph); Numerical Analysis (math.NA)

This study delves into the auto-ignition temperature of n-heptane and ethanol mixtures within a counterflow flame configuration under low strain rate, with a particular focus on the impact of ethanol blending on heat release rates. Employing the sensitivity analysis method inspired by Zurada's sensitivity approach for neural network, this study identifies the group of critical species influencing the heat release rate. Further analysis concentration change reveals the intricate interactions among these various radicals across different temperature zones. It is found that, in n-heptane dominant mixtures, inhibition of low-temperature chemistry (LTC) caused by additional ethanol, impacts heat release rate at high temperature zone through diffusion effect of specific radicals such as CH2O, C2H4, C3H6 and H2O2. For ethanol-dominant mixtures, an increase in heat release rate was observed with higher ethanol fraction. Further concentration change analysis elucidated it is primarily attributed to the decomposition of ethanol and its subsequent reactions. This research underscores the significance of incorporating both chemical kinetics and species diffusion effects when analyzing the counterflow configuration of complex fuel mixtures.

[431]  arXiv:2404.16763 (cross-list from math.CO) [pdf, other]
Title: The asymptotic spectrum distance, graph limits, and the Shannon capacity
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Functional Analysis (math.FA)

Determining the Shannon capacity of graphs is a long-standing open problem in information theory, graph theory and combinatorial optimization. Over decades, a wide range of upper and lower bound methods have been developed to analyze this problem. However, despite tremendous effort, even small instances of the problem have remained open.
In recent years, a new dual characterization of the Shannon capacity of graphs, asymptotic spectrum duality, has unified and extended known upper bound methods and structural theorems. In this paper, building on asymptotic spectrum duality, we develop a new theory of graph distance, that we call asymptotic spectrum distance, and corresponding limits (reminiscent of, but different from, the celebrated theory of cut-norm, graphons and flag algebras). We propose a graph limit approach to the Shannon capacity problem: to determine the Shannon capacity of a graph, construct a sequence of easier to analyse graphs converging to it.
(1) We give a very general construction of non-trivial converging sequences of graphs (in a family of circulant graphs).
(2) We construct Cauchy sequences of finite graphs that do not converge to any finite graph, but do converge to an infinite graph. We establish strong connections between convergence questions of finite graphs and the asymptotic properties of Borsuk-like infinite graphs on the circle.
(3) We observe that all best-known lower bound constructions for Shannon capacity of small odd cycles can be obtained from a "finite" version of the graph limit approach. We develop computational and theoretical aspects of this approach and use these to obtain a new Shannon capacity lower bound for the fifteen-cycle.
The theory of asymptotic spectrum distance applies not only to Shannon capacity of graphs; indeed, we will develop it for a general class of mathematical objects and their asymptotic properties.

Replacements for Fri, 26 Apr 24

[432]  arXiv:1902.00615 (replaced) [pdf, other]
Title: Confidence-Triggered Detection: Accelerating Real-time Tracking-by-detection Systems
Comments: To be appeared in 2024 5th International Conference on Electronic Communication and Artificial Intelligence
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[433]  arXiv:1902.10983 (replaced) [pdf, other]
Title: Graph and String Parameters: Connections Between Pathwidth, Cutwidth and the Locality Number
Subjects: Data Structures and Algorithms (cs.DS)
[434]  arXiv:2004.07558 (replaced) [pdf, other]
Title: Framework for $\exists \mathbb{R}$-Completeness of Two-Dimensional Packing Problems
Journal-ref: TheoretiCS, Volume 3 (2024), Article 11, 1-78
Subjects: Computational Geometry (cs.CG); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
[435]  arXiv:2105.08899 (replaced) [pdf, other]
Title: FairCMS: Cloud Media Sharing with Fair Copyright Protection
Comments: Accepted by IEEE Transactions on Computational Social Systems
Subjects: Cryptography and Security (cs.CR); Multimedia (cs.MM)
[436]  arXiv:2108.08930 (replaced) [pdf, other]
Title: Cross-Silo Federated Learning for Multi-Tier Networks with Vertical and Horizontal Data Partitioning
Comments: Published in ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2022. Updated minor typos in the proof
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[437]  arXiv:2111.10409 (replaced) [pdf, other]
Title: The Acrobatics of BQP
Comments: 64 pages. V2: various writing improvements. V3: minor fixes to spelling and references. V4: corrected an error in what is now Lemma 53
Journal-ref: 37th Computational Complexity Conference (CCC 2022), Leibniz International Proceedings in Informatics (LIPIcs) 234, pp. 20:1-20:17 (2022)
Subjects: Computational Complexity (cs.CC); Quantum Physics (quant-ph)
[438]  arXiv:2201.10017 (replaced) [pdf, ps, other]
Title: Online Convex Optimization Using Coordinate Descent Algorithms
Comments: Accepted for publication in Automatica
Journal-ref: Automatica, vol. 165, Article 111681, 2024
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[439]  arXiv:2204.00955 (replaced) [pdf, other]
Title: FIRST: FrontrunnIng Resilient Smart ConTracts
Comments: 16 pages, 5 figures
Subjects: Cryptography and Security (cs.CR)
[440]  arXiv:2204.01707 (replaced) [pdf, other]
Title: Quadratic Neuron-empowered Heterogeneous Autoencoder for Unsupervised Anomaly Detection
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
[441]  arXiv:2205.06891 (replaced) [pdf, ps, other]
Title: Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation
Comments: Accepted by IEEE Transactions on Artificial Intelligence
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
[442]  arXiv:2206.08479 (replaced) [pdf, other]
Title: Modifying the Asynchronous Jacobi Method for Data Corruption Resilience
Comments: Submitted to SIAM SISC September 29, 2023 Revision submitted to SIAM SISC April 7, 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
[443]  arXiv:2206.09821 (replaced) [pdf, other]
Title: Exceedance Probability Forecasting via Regression for Significant Wave Height Prediction
Comments: code available, 22 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[444]  arXiv:2207.09993 (replaced) [pdf, other]
Title: Computing Tree Decompositions with Small Independence Number
Comments: Accepted at ICALP 2024
Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
[445]  arXiv:2208.03346 (replaced) [pdf, other]
Title: Active Learning for Non-Parametric Choice Models
Authors: Fransisca Susan (1), Negin Golrezaei (2), Ehsan Emamjomeh-Zadeh (3), David Kempe (4) ((1) MIT Operations Research Center, (2) MIT Sloan School of Management, (3) Meta Platforms, Inc., (4) University of Southern California, Los Angeles)
Comments: 45 pages, 4 figures
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)
[446]  arXiv:2208.05565 (replaced) [pdf, other]
Title: Stable Homology-Based Cycle Centrality Measures
Comments: version 2, 23 pages, includes application to fractal-like data
Subjects: Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)
[447]  arXiv:2209.07936 (replaced) [pdf, other]
Title: PA-Boot: A Formally Verified Authentication Protocol for Multiprocessor Secure Boot
Subjects: Cryptography and Security (cs.CR); Hardware Architecture (cs.AR)
[448]  arXiv:2210.02498 (replaced) [pdf, other]
Title: Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
Comments: added details about a human evaluation
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[449]  arXiv:2301.04900 (replaced) [pdf, other]
Title: Stretched and measured neural predictions of complex network dynamics
Subjects: Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[450]  arXiv:2301.12577 (replaced) [pdf, other]
Title: hp-version analysis for arbitrarily shaped elements on the boundary discontinuous Galerkin method for Stokes systems
Subjects: Numerical Analysis (math.NA)
[451]  arXiv:2301.12600 (replaced) [pdf, other]
Title: Bagging Provides Assumption-free Stability
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[452]  arXiv:2302.04378 (replaced) [pdf, ps, other]
Title: Parallel Derandomization for Coloring
Comments: 26 Pages. The paper will appear in IPDPS 2024
Subjects: Data Structures and Algorithms (cs.DS)
[453]  arXiv:2302.14648 (replaced) [pdf, other]
Title: Digital Over-the-Air Federated Learning in Multi-Antenna Systems
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[454]  arXiv:2302.14661 (replaced) [pdf, other]
Title: Explainability as a Requirement for Hardware: Introducing Explainable Hardware (XHW)
Subjects: Computers and Society (cs.CY)
[455]  arXiv:2303.07985 (replaced) [src]
Title: Logarithmic Weisfeiler--Leman and Treewidth
Comments: There were minor bugs in this version. We corrected those bugs and folded this result into a different paper: arXiv:2306.17777
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Logic in Computer Science (cs.LO); Combinatorics (math.CO)
[456]  arXiv:2303.08971 (replaced) [pdf, other]
Title: Stable Set Polytopes with High Lift-and-Project Ranks for the Lovász-Schrijver SDP Operator
Comments: Due to the suggestions of an editor and some of the referees, the first version of this manuscript was split into two manuscripts. The new second manuscript which contains some additional results is arXiv:2401.01476
Subjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[457]  arXiv:2303.09841 (replaced) [pdf, other]
Title: GADformer: A Transparent Transformer Model for Group Anomaly Detection on Trajectories
Comments: accepted at International Joint Conference on Neural Networks (IJCNN) 2024, Yokohama, Japan
Subjects: Machine Learning (cs.LG)
[458]  arXiv:2303.12402 (replaced) [pdf, other]
Title: Benders decomposition algorithms for minimizing the spread of harmful contagions in networks
Subjects: Optimization and Control (math.OC); Discrete Mathematics (cs.DM)
[459]  arXiv:2303.13226 (replaced) [pdf, other]
Title: LearnedFTL: A Learning-Based Page-Level FTL for Reducing Double Reads in Flash-Based SSDs
Comments: Published in 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA'24)
Subjects: Hardware Architecture (cs.AR); Operating Systems (cs.OS)
[460]  arXiv:2304.00083 (replaced) [pdf, other]
Title: A Generative Framework for Low-Cost Result Validation of Machine Learning-as-a-Service Inference
Comments: 15 pages, 12 figures
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[461]  arXiv:2304.06768 (replaced) [pdf, other]
Title: Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning
Comments: 9 pages, 10 figures, 3 tables
Journal-ref: Phys. Rev. Research 6, 023069 (2024)
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
[462]  arXiv:2304.10552 (replaced) [pdf, ps, other]
Title: Approximation and interpolation of deep neural networks
Comments: This is a revised, improved and more general result than the previous version
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR); Machine Learning (stat.ML)
[463]  arXiv:2304.11014 (replaced) [pdf, other]
Title: Effectful Semantics in Bicategories: Strong, Commutative, and Concurrent Pseudomonads
Comments: To appear in LICS 2024. Replaces previous draft "Strong pseudomonads and premonoidal bicategories"
Subjects: Logic in Computer Science (cs.LO); Category Theory (math.CT)
[464]  arXiv:2305.08275 (replaced) [pdf, other]
Title: ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Comments: CVPR2024
Journal-ref: CVPR2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[465]  arXiv:2305.10506 (replaced) [pdf, other]
Title: Exact Recovery for System Identification with More Corrupt Data than Clean Data
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[466]  arXiv:2305.10722 (replaced) [pdf, other]
Title: Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[467]  arXiv:2305.12517 (replaced) [pdf, other]
Title: Description-Based Text Similarity
Comments: A preprint
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[468]  arXiv:2305.15792 (replaced) [pdf, other]
Title: IDEA: Invariant Defense for Graph Adversarial Robustness
Comments: Submitted to Information Sciences
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[469]  arXiv:2305.16886 (replaced) [pdf, other]
Title: Understanding Sparse Neural Networks from their Topology via Multipartite Graph Representations
Comments: Accepted at Transactions on Machine Learning Research (TMLR)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[470]  arXiv:2305.19211 (replaced) [pdf, other]
[471]  arXiv:2306.00008 (replaced) [pdf, other]
Title: Brainformers: Trading Simplicity for Efficiency
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[472]  arXiv:2306.08832 (replaced) [pdf, other]
Title: Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[473]  arXiv:2306.09174 (replaced) [pdf, ps, other]
Title: ANOVA approximation with mixed tensor product basis on scattered points
Subjects: Numerical Analysis (math.NA)
[474]  arXiv:2306.16021 (replaced) [pdf, other]
Title: Structure in Deep Reinforcement Learning: A Survey and Open Problems
Comments: Published at the Journal of Artificial Intelligence Research, Volume 79, Pages 1167-1236
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[475]  arXiv:2306.16122 (replaced) [pdf, other]
Title: Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination methods
Comments: 17 pages, 6 figures, 12 tables
Journal-ref: TMLR 2024 (https://openreview.net/pdf?id=z5AXLMBWdU)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[476]  arXiv:2306.17777 (replaced) [pdf, ps, other]
Title: Canonizing Graphs of Bounded Rank-Width in Parallel via Weisfeiler--Leman
Comments: Full version of our paper that will appear in SWAT 2024
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Logic in Computer Science (cs.LO); Combinatorics (math.CO)
[477]  arXiv:2307.02180 (replaced) [pdf, ps, other]
Title: Runtime Repeated Recursion Unfolding in CHR: A Just-In-Time Online Program Optimization Strategy That Can Achieve Super-Linear Speedup
Authors: Thom Fruehwirth
Comments: Minor Revision of major revision of submission to Journal Fundamenta Informaticae
Subjects: Programming Languages (cs.PL); Computational Complexity (cs.CC); Performance (cs.PF); Symbolic Computation (cs.SC)
[478]  arXiv:2307.03812 (replaced) [pdf, ps, other]
Title: Coordinate-based neural representations for computational adaptive optics in widefield microscopy
Comments: 36 pages, 5 figures
Subjects: Image and Video Processing (eess.IV); Systems and Control (eess.SY); Optics (physics.optics)
[479]  arXiv:2307.04270 (replaced) [pdf, ps, other]
Title: A Complete Finite Axiomatisation of the Equational Theory of Common Meadows
Subjects: Logic in Computer Science (cs.LO)
[480]  arXiv:2307.12062 (replaced) [pdf, other]
Title: Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations
Comments: Accepted at The Twelfth International Conference on Learning Representations (ICLR 2024)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[481]  arXiv:2307.15388 (replaced) [pdf, other]
Title: An Empirical Study of Large-Scale Data-Driven Full Waveform Inversion
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Geophysics (physics.geo-ph)
[482]  arXiv:2307.16811 (replaced) [pdf, other]
Title: DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures
Comments: 15 pages, 7 figures, 4 tables
Subjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
[483]  arXiv:2308.00994 (replaced) [pdf, other]
Title: SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems
Comments: The paper is under consideration at Pattern Recognition Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[484]  arXiv:2308.10025 (replaced) [pdf, other]
Title: I3: Intent-Introspective Retrieval Conditioned on Instructions
Comments: Accepted by SIGIR 2024
Subjects: Computation and Language (cs.CL)
[485]  arXiv:2308.12388 (replaced) [pdf, other]
Title: Missing Data Imputation Based on Dynamically Adaptable Structural Equation Modeling with Self-Attention
Authors: Ou Deng, Qun Jin
Subjects: Machine Learning (cs.LG)
[486]  arXiv:2309.02961 (replaced) [pdf, other]
Title: LuViRA Dataset Validation and Discussion: Comparing Vision, Radio, and Audio Sensors for Indoor Localization
Comments: 10 pages, 11 figures
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[487]  arXiv:2309.08925 (replaced) [pdf, other]
Title: DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning
Comments: 16 pages, 7 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[488]  arXiv:2309.11585 (replaced) [pdf, other]
Title: SpeechAlign: a Framework for Speech Translation Alignment Evaluation
Comments: LREC-COLING 2024
Subjects: Computation and Language (cs.CL)
[489]  arXiv:2310.01235 (replaced) [pdf, other]
Title: COIN-LIO: Complementary Intensity-Augmented LiDAR Inertial Odometry
Subjects: Robotics (cs.RO)
[490]  arXiv:2310.01641 (replaced) [pdf, other]
Title: You Only Look at Once for Real-time and Generic Multi-Task
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[491]  arXiv:2310.01967 (replaced) [pdf, other]
Title: Efficient Frontier Management for Collaborative Active SLAM
Comments: 7 pages, 11 figures 3 Tables
Subjects: Robotics (cs.RO)
[492]  arXiv:2310.04870 (replaced) [pdf, other]
Title: Lemur: Integrating Large Language Models in Automated Program Verification
Comments: Accepted at ICLR'24
Subjects: Formal Languages and Automata Theory (cs.FL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
[493]  arXiv:2310.05054 (replaced) [pdf, ps, other]
Title: Low-Latency Video Conferencing via Optimized Packet Routing and Reordering
Comments: accepted by IEEE IWQoS 2024
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
[494]  arXiv:2310.07064 (replaced) [pdf, other]
Title: Large Language Models can Learn Rules
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[495]  arXiv:2310.11385 (replaced) [pdf, other]
Title: A voxel-level approach to brain age prediction: A method to assess regional brain aging
Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) this https URL
Journal-ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[496]  arXiv:2310.13572 (replaced) [pdf, other]
Title: Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature Space
Subjects: Machine Learning (cs.LG)
[497]  arXiv:2310.14117 (replaced) [pdf, other]
Title: ZTD$_{JAVA}$: Mitigating Software Supply Chain Vulnerabilities via Zero-Trust Dependencies
Comments: 15 pages, 5 figures, 5 tables
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
[498]  arXiv:2310.14671 (replaced) [pdf, other]
Title: Scrap Your Schedules with PopDescent
Subjects: Machine Learning (cs.LG)
[499]  arXiv:2310.16139 (replaced) [pdf, other]
Title: Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos
Comments: 17 pages, 18 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[500]  arXiv:2310.16161 (replaced) [pdf, other]
Title: MyriadAL: Active Few Shot Learning for Histopathology
Comments: Accepted to IEEE CAI 2024. 8 pages, 2 figures. Code available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[501]  arXiv:2310.17101 (replaced) [pdf, other]
Title: Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
Comments: 6 pages, 4 figures; Accepted by ICME 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[502]  arXiv:2311.02610 (replaced) [pdf, other]
Title: An adaptive standardisation methodology for Day-Ahead electricity price forecasting
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Methodology (stat.ME)
[503]  arXiv:2311.03486 (replaced) [pdf, other]
Title: Fostering Human Learning in Sequential Decision-Making: Understanding the Role of Evaluative Feedback
Subjects: Human-Computer Interaction (cs.HC)
[504]  arXiv:2311.04934 (replaced) [pdf, other]
Title: Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Comments: To appear at MLSys 2024
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[505]  arXiv:2311.07744 (replaced) [pdf, other]
Title: Two-Stage Aggregation with Dynamic Local Attention for Irregular Time Series
Comments: A short version of this paper has been accepted for presentation at the Findings of Machine Learning for Health (ML4H) 2023 conference
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
[506]  arXiv:2311.08035 (replaced) [pdf, other]
Title: Data-driven building energy efficiency prediction using physics-informed neural networks
Comments: 8 pages, 1 figure
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
[507]  arXiv:2311.09736 (replaced) [pdf, other]
Title: CARE: Extracting Experimental Findings From Clinical Literature
Comments: To appear at NAACL Findings 2024
Subjects: Computation and Language (cs.CL)
[508]  arXiv:2311.11491 (replaced) [pdf, other]
Title: On the Relationship Between Interpretability and Explainability in Machine Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[509]  arXiv:2311.14199 (replaced) [pdf, other]
Title: A Systematic Review of Deep Learning-based Research on Radiology Report Generation
Comments: 26 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[510]  arXiv:2311.15318 (replaced) [pdf, ps, other]
Title: Perspective in Opinion Dynamics on Complex Convex Domains of Time Networks for Addiction, Forgetting
Authors: Yasuko Kawahata
Comments: Discussion Paper:Convex Regions of Opinion Dynamics, Approaches to the Complexity of Binary Consensus with Reference to Addiction and Obliviousness:Integrated Dimer Model Perspective(2023) This paper is partially an attempt to utilize "Generative AI" and was written with educational intent. There are currently no plans for it to become a peer-reviewed paper
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
[511]  arXiv:2312.04484 (replaced) [pdf, other]
Title: FRNet: Frustum-Range Networks for Scalable LiDAR Segmentation
Comments: Preprint; 16 pages, 8 figures, 10 tables; Code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[512]  arXiv:2312.04564 (replaced) [pdf, other]
Title: EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Comments: Website: this https URL Code: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
[513]  arXiv:2312.04902 (replaced) [pdf, other]
Title: BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting
Comments: To Appear in the 45th IEEE Symposium on Security and Privacy, May 20-23, 2024
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
[514]  arXiv:2312.06731 (replaced) [pdf, other]
Title: Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[515]  arXiv:2312.08083 (replaced) [pdf, ps, other]
Title: Training of Neural Networks with Uncertain Data: A Mixture of Experts Approach
Authors: Lucas Luttner
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[516]  arXiv:2312.09040 (replaced) [pdf, other]
Title: STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
Comments: ICASSP 2024 Best Student Paper Awarded. Code URL: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[517]  arXiv:2312.11713 (replaced) [pdf, other]
Title: Indoor and Outdoor 3D Scene Graph Generation via Language-Enabled Spatial Ontologies
Comments: 10 pages, 6 figures, accepted to Robotics and Automation Letters
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[518]  arXiv:2312.12933 (replaced) [pdf, other]
Title: Automated Testing for Text-to-Image Software
Authors: Siqi Gu
Subjects: Software Engineering (cs.SE)
[519]  arXiv:2312.14408 (replaced) [pdf, ps, other]
Title: Extended p-median problems for balancing service efficiency and equality
Comments: 38 pages, 4 tables, 5 figures
Subjects: Computers and Society (cs.CY)
[520]  arXiv:2401.00428 (replaced) [pdf, other]
Title: Training towards significance with the decorrelated event classifier transformer neural network
Authors: Jaebak Kim
Comments: 11 pages, 7 figures, Accepted by PRD
Subjects: High Energy Physics - Experiment (hep-ex); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[521]  arXiv:2401.01349 (replaced) [pdf, ps, other]
Title: The Anatomy Spread of Online Opinion Polarization: The Pivotal Role of Super-Spreaders in Social Networks
Authors: Yasuko Kawahata
Comments: Detection Method for Social Subsets Consisting of Anti-Network Construction for Unilateral Preference Behavior on Directed Temporal Networks(2023)
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
[522]  arXiv:2401.03051 (replaced) [pdf, other]
Title: On the Stability of a non-hyperbolic nonlinear map with non-bounded set of non-isolated fixed points with applications to Machine Learning
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)
[523]  arXiv:2401.05722 (replaced) [pdf, other]
Title: Micromagnetic simulations of the size dependence of the Curie temperature in ferromagnetic nanowires and nanolayers
Comments: 28 pages
Journal-ref: Journal of Magnetism and Magnetic Materials 598 (2024) 172040
Subjects: Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Numerical Analysis (math.NA)
[524]  arXiv:2401.06209 (replaced) [pdf, other]
Title: Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Comments: Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[525]  arXiv:2401.08903 (replaced) [pdf, other]
Title: Rethinking Impersonation and Dodging Attacks on Face Recognition Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[526]  arXiv:2401.09643 (replaced) [pdf, ps, other]
Title: OFDM Reference Signal Pattern Design Criteria for Integrated Communication and Sensing
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)
[527]  arXiv:2401.09648 (replaced) [pdf, ps, other]
Title: Staggered Comb Reference Signal Design for Integrated Communication and Sensing
Comments: accepted by IEEE International Symposium on Personal, Indoor and Mobile Radio Communications. arXiv admin note: substantial text overlap with arXiv:2401.09643
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)
[528]  arXiv:2401.12794 (replaced) [pdf, other]
Title: Benchmarking LLMs via Uncertainty Quantification
Comments: 25 pages, preprints
Subjects: Computation and Language (cs.CL)
[529]  arXiv:2401.13516 (replaced) [pdf, other]
Title: Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces
Comments: arXiv admin note: substantial text overlap with arXiv:2308.09921, arXiv:2305.05943
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
[530]  arXiv:2401.14828 (replaced) [pdf, other]
Title: TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Comments: Accpeted by Siggraph 2024 & ACM Transactions on Graphics
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[531]  arXiv:2401.15298 (replaced) [pdf, other]
Title: Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring
Subjects: Software Engineering (cs.SE)
[532]  arXiv:2401.15407 (replaced) [pdf, other]
Title: Euler-Maruyama approximation for stochastic fractional neutral integro-differential equations with weakly singular kernel
Subjects: Numerical Analysis (math.NA); Classical Analysis and ODEs (math.CA)
[533]  arXiv:2402.00281 (replaced) [pdf, other]
Title: Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues
Comments: 15 pages, 11 figures, 3 tables, International Conference on Automatic Face and Gesture Recognition (FG 2024)
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[534]  arXiv:2402.04248 (replaced) [pdf, other]
Title: Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Comments: Changes in v2: experiments on formal language ICL and explorations of width vs. depth on ICL; code repo available (24 pages, 10 figures)
Subjects: Machine Learning (cs.LG)
[535]  arXiv:2402.05224 (replaced) [pdf, other]
Title: VerAs: Verify then Assess STEM Lab Reports
Comments: It is accepted to AIED2024!
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[536]  arXiv:2402.06852 (replaced) [pdf, ps, other]
Title: ChemLLM: A Chemical Large Language Model
Comments: 9 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[537]  arXiv:2402.07314 (replaced) [pdf, other]
Title: Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Comments: RLHF, Preference Learning, Alignment for LLMs
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[538]  arXiv:2402.07635 (replaced) [pdf, other]
Title: Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Comments: Accepted by CVPR2024. Website link: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[539]  arXiv:2402.09216 (replaced) [pdf, other]
Title: AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails
Comments: To be presented at Learning@Scale 2024
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
[540]  arXiv:2402.15264 (replaced) [pdf, other]
Title: DEEM: Dynamic Experienced Expert Modeling for Stance Detection
Comments: Accepted by LREC-COLING 2024, Oral presentation
Subjects: Computation and Language (cs.CL)
[541]  arXiv:2402.17493 (replaced) [pdf, ps, other]
Title: Predicting postoperative risks using large language models
Comments: Supplemental file available at: this https URL models publicly available at: this https URL AND this https URL
Subjects: Computation and Language (cs.CL)
[542]  arXiv:2402.17660 (replaced) [pdf, other]
Title: TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations
Comments: Version accepted in Journal of Chemical Theory and Computation
Subjects: Machine Learning (cs.LG); Biological Physics (physics.bio-ph); Chemical Physics (physics.chem-ph); Computational Physics (physics.comp-ph)
[543]  arXiv:2402.18558 (replaced) [pdf, other]
Title: Unifying F1TENTH Autonomous Racing: Survey, Methods and Benchmarks
Comments: 12 pages, 18 figures. Sumbitted for publication
Subjects: Robotics (cs.RO)
[544]  arXiv:2403.00024 (replaced) [pdf, other]
Title: FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking
Comments: arXiv admin note: text overlap with arXiv:2402.18611
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[545]  arXiv:2403.00980 (replaced) [pdf, other]
Title: Even-Ifs From If-Onlys: Are the Best Semi-Factual Explanations Found Using Counterfactuals As Guides?
Comments: 16 pages, 5 figures
Subjects: Artificial Intelligence (cs.AI)
[546]  arXiv:2403.01150 (replaced) [pdf, ps, other]
Title: Sequential Rotations and Error Analysis of a Simple Quaternion Estimator
Subjects: Methodology (stat.ME); Systems and Control (eess.SY)
[547]  arXiv:2403.01658 (replaced) [pdf, ps, other]
Title: Noise sensitivity and stability on groups
Authors: Ryokichi Tanaka
Subjects: Probability (math.PR); Discrete Mathematics (cs.DM)
[548]  arXiv:2403.02331 (replaced) [pdf, ps, other]
Title: Toward Neuromic Computing: Neurons as Autoencoders
Authors: Larry Bull
Subjects: Neural and Evolutionary Computing (cs.NE)
[549]  arXiv:2403.02411 (replaced) [pdf, other]
Title: NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[550]  arXiv:2403.02932 (replaced) [pdf, other]
Title: RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules
Comments: Accepted by WWW 2024
Subjects: Computation and Language (cs.CL)
[551]  arXiv:2403.03218 (replaced) [pdf, other]
[552]  arXiv:2403.03268 (replaced) [pdf, other]
Title: A Transient Thermal Model for Power Electronics Systems
Comments: Accepted for publication in IEEE Southeastcon 24
Subjects: Numerical Analysis (math.NA)
[553]  arXiv:2403.05530 (replaced) [pdf, other]
[554]  arXiv:2403.05743 (replaced) [pdf, ps, other]
Title: Forecasting Electricity Market Signals via Generative AI
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); General Economics (econ.GN)
[555]  arXiv:2403.06862 (replaced) [pdf, other]
Title: Real-Time Simulated Avatar from Head-Mounted Sensors
Comments: CVPR 2024 Hightlight. Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Robotics (cs.RO)
[556]  arXiv:2403.07592 (replaced) [pdf, other]
Title: Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[557]  arXiv:2403.07789 (replaced) [pdf, other]
Title: RobotCycle: Assessing Cycling Safety in Urban Environments
Comments: IEEE Intelligent Vehicles Symposium (IV 2024)
Subjects: Robotics (cs.RO)
[558]  arXiv:2403.08209 (replaced) [pdf, other]
Title: Height-bounded Lempel-Ziv encodings
Comments: abstract shortened to fit arxiv requirements
Subjects: Data Structures and Algorithms (cs.DS)
[559]  arXiv:2403.08291 (replaced) [pdf, other]
Title: CleanAgent: Automating Data Standardization with LLM-based Agents
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
[560]  arXiv:2403.08733 (replaced) [pdf, other]
Title: GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Comments: Our Project Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[561]  arXiv:2403.11236 (replaced) [pdf, other]
Title: ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization
Comments: Accepted by LREC-COLING 2024
Subjects: Computation and Language (cs.CL)
[562]  arXiv:2403.11807 (replaced) [pdf, other]
Title: How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Comments: 16 pages of main text. 11 pages of appendices. 15 figures, 9 tables. Updated scoring scheme
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[563]  arXiv:2403.11893 (replaced) [pdf, other]
Title: Quantum Coordination Rates in Multi-Partite Networks
Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
[564]  arXiv:2403.13123 (replaced) [pdf, ps, other]
Title: On incomplete Cholesky factorizations in half precision arithmetic
Comments: 18 pages
Subjects: Numerical Analysis (math.NA)
[565]  arXiv:2403.13162 (replaced) [pdf, other]
Title: A Textbook Solution for Dynamic Strings
Subjects: Data Structures and Algorithms (cs.DS)
[566]  arXiv:2403.14713 (replaced) [pdf, other]
Title: Auditing Fairness under Unobserved Confounding
Comments: AISTATS 2024
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Methodology (stat.ME); Machine Learning (stat.ML)
[567]  arXiv:2403.15360 (replaced) [pdf, other]
Title: SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Systems and Control (eess.SY)
[568]  arXiv:2403.18442 (replaced) [pdf, other]
Title: Backpropagation-free Network for 3D Test-time Adaptation
Comments: CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[569]  arXiv:2403.18623 (replaced) [pdf, other]
Title: Antitrust, Amazon, and Algorithmic Auditing
Comments: The paper has been accepted to appear at Journal of Institutional and Theoretical Economics (JITE) 2024
Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
[570]  arXiv:2403.19461 (replaced) [pdf, other]
Title: Learning Sampling Distribution and Safety Filter for Autonomous Driving with VQ-VAE and Differentiable Optimization
Subjects: Robotics (cs.RO)
[571]  arXiv:2403.20194 (replaced) [pdf, other]
Title: ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models
Subjects: Multimedia (cs.MM)
[572]  arXiv:2404.00680 (replaced) [pdf, other]
Title: Learning to Rank Patches for Unbiased Image Redundancy Reduction
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[573]  arXiv:2404.03273 (replaced) [pdf, other]
Title: Gaussian-Smoothed Sliced Probability Divergences
Authors: Mokhtar Z. Alaya (LMAC), Alain Rakotomamonjy (LITIS), Maxime Berar (LITIS), Gilles Gasso (LITIS)
Comments: arXiv admin note: substantial text overlap with arXiv:2110.10524
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[574]  arXiv:2404.03537 (replaced) [pdf, other]
Title: If It's Not Enough, Make It So: Reducing Authentic Data Demand in Face Recognition through Synthetic Faces
Comments: Accepted as full paper at FG 2024 main track
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[575]  arXiv:2404.04496 (replaced) [pdf, other]
Title: Towards Better Graph Neural Neural Network-based Fault Localization Through Enhanced Code Representation
Subjects: Software Engineering (cs.SE)
[576]  arXiv:2404.05212 (replaced) [pdf, other]
Title: DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation
Authors: Yingtao Tian
Comments: Accepted in 15th International Conference on Computational Creativity, ICCC'24
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[577]  arXiv:2404.05297 (replaced) [pdf, other]
Title: Automated Attack Synthesis for Constant Product Market Makers
Comments: 12 pages, 8 figures
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
[578]  arXiv:2404.06069 (replaced) [pdf, ps, other]
Title: Fully Dynamic Matching and Ordered Ruzsa-Szemerédi Graphs
Subjects: Data Structures and Algorithms (cs.DS)
[579]  arXiv:2404.08320 (replaced) [pdf, other]
Title: A splitting, discontinuous Galerkin solver for the cell-by-cell electroneutral Nernst-Planck framework
Subjects: Computational Engineering, Finance, and Science (cs.CE)
[580]  arXiv:2404.09174 (replaced) [pdf, other]
Title: Investigating the impact of virtual element misalignment in collaborative Augmented Reality experiences
Comments: Paper accepted for publication/presentation at QoMEX 2024
Subjects: Human-Computer Interaction (cs.HC)
[581]  arXiv:2404.09779 (replaced) [pdf, other]
Title: A replica analysis of under-bagging
Comments: 25 pages, 7 figures
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG)
[582]  arXiv:2404.09987 (replaced) [pdf, other]
Title: OneChart: Purify the Chart Structural Extraction via One Auxiliary Token
Comments: 14 pages, 9 figures and 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[583]  arXiv:2404.10201 (replaced) [pdf, other]
Title: Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages
Comments: Fixed author ordering
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG)
[584]  arXiv:2404.10209 (replaced) [pdf, other]
Title: Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[585]  arXiv:2404.10536 (replaced) [pdf, ps, other]
Title: Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe
Comments: Author accepted version of paper in the PERMAVOST workshop at the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC 24)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[586]  arXiv:2404.11003 (replaced) [pdf, other]
Title: InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification
Comments: IJCAI 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[587]  arXiv:2404.12096 (replaced) [pdf, other]
Title: LongEmbed: Extending Embedding Models for Long Context Retrieval
Comments: Fix results for Nomic
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[588]  arXiv:2404.12127 (replaced) [pdf, other]
Title: Personalized Forgetting Mechanism with Concept-Driven Knowledge Tracing
Comments: under review
Subjects: Artificial Intelligence (cs.AI)
[589]  arXiv:2404.12390 (replaced) [pdf, other]
Title: BLINK: Multimodal Large Language Models Can See but Not Perceive
Comments: Multimodal Benchmark, Project Url: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[590]  arXiv:2404.12457 (replaced) [pdf, other]
Title: RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computation and Language (cs.CL); Machine Learning (cs.LG)
[591]  arXiv:2404.12633 (replaced) [pdf, other]
Title: FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation
Comments: Accepted by IJCAI 2024
Subjects: Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
[592]  arXiv:2404.12724 (replaced) [pdf, other]
Title: Graph Convolutional Network For Semi-supervised Node Classification With Subgraph Sketching
Authors: Zibin Huang, Jun Xian
Comments: 20 pages, 12 figures, Summitted to ICPP 2024
Subjects: Machine Learning (cs.LG)
[593]  arXiv:2404.12812 (replaced) [pdf, other]
Title: Algorithmic Changes Are Not Enough: Evaluating the Removal of Race Adjustment from the eGFR Equation
Comments: Accepted to Conference on Health, Inference, and Learning (CHIL) 2024
Subjects: Computers and Society (cs.CY); Applications (stat.AP)
[594]  arXiv:2404.13016 (replaced) [pdf, other]
Title: Optimizing Calibration by Gaining Aware of Prediction Correctness
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[595]  arXiv:2404.13066 (replaced) [pdf, other]
Title: Leveraging Large Language Model as Simulated Patients for Clinical Education
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[596]  arXiv:2404.13414 (replaced) [pdf, other]
Title: Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study
Comments: Accepted to Learning @ Scale 2024
Subjects: Human-Computer Interaction (cs.HC)
[597]  arXiv:2404.13468 (replaced) [pdf, other]
Title: A Grassroots Architecture to Supplant Global Digital Platforms by a Global Digital Democracy
Authors: Ehud Shapiro
Subjects: Networking and Internet Architecture (cs.NI); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA); Social and Information Networks (cs.SI)
[598]  arXiv:2404.13489 (replaced) [pdf, other]
Title: SCHENO: Measuring Schema vs. Noise in Graphs
Subjects: Databases (cs.DB)
[599]  arXiv:2404.13503 (replaced) [pdf, other]
Title: Predict to Minimize Swap Regret for All Payoff-Bounded Tasks
Authors: Lunjia Hu, Yifan Wu
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[600]  arXiv:2404.13532 (replaced) [pdf, other]
Title: SpringGrasp: Synthesizing Compliant, Dexterous Grasps under Shape Uncertainty
Subjects: Robotics (cs.RO)
[601]  arXiv:2404.13591 (replaced) [pdf, other]
Title: MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[602]  arXiv:2404.14061 (replaced) [pdf, other]
Title: FedTAD: Topology-aware Data-free Knowledge Distillation for Subgraph Federated Learning
Comments: Accepted by IJCAI 2024
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Social and Information Networks (cs.SI)
[603]  arXiv:2404.14246 (replaced) [pdf, other]
Title: Chain of trust: Unraveling references among Common Criteria certified products
Subjects: Cryptography and Security (cs.CR)
[604]  arXiv:2404.14250 (replaced) [pdf, ps, other]
Title: Frosty: Bringing strong liveness guarantees to the Snow family of consensus protocols
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[605]  arXiv:2404.14560 (replaced) [pdf, other]
Title: Adaptive Local Binary Pattern: A Novel Feature Descriptor for Enhanced Analysis of Kidney Abnormalities in CT Scan Images using ensemble based Machine Learning Approach
Comments: 17 pages, 5 tables, 4 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[606]  arXiv:2404.14604 (replaced) [pdf, other]
Title: Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
Subjects: Computation and Language (cs.CL)
[607]  arXiv:2404.14700 (replaced) [pdf, other]
Title: FlashSpeech: Efficient Zero-Shot Speech Synthesis
Comments: Efficient zero-shot speech synthesis
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[608]  arXiv:2404.14829 (replaced) [pdf, other]
Title: Revisiting Neural Networks for Continual Learning: An Architectural Perspective
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[609]  arXiv:2404.14831 (replaced) [pdf, other]
Title: Towards Universal Dense Blocking for Entity Resolution
Comments: Code and data are available at this this https URL
Subjects: Databases (cs.DB); Computation and Language (cs.CL); Information Retrieval (cs.IR)
[610]  arXiv:2404.14885 (replaced) [pdf, other]
Title: Domain adaptive pose estimation via multi-level alignment
Comments: accepted to icme2024
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[611]  arXiv:2404.15009 (replaced) [pdf, other]
[612]  arXiv:2404.15377 (replaced) [pdf, other]
Title: Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
[613]  arXiv:2404.15419 (replaced) [pdf, ps, other]
Title: Using Deep Learning to Identify Initial Error Sensitivity of ENSO Forecasts
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)
[614]  arXiv:2404.15510 (replaced) [pdf, other]
Title: NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator
Comments: Visit this https URL for WebGUI based simulations
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[615]  arXiv:2404.15606 (replaced) [pdf, other]
Title: Multilevel Particle Filters for Partially Observed McKean-Vlasov Stochastic Differential Equations
Comments: 21 pages, 2 figures
Subjects: Numerical Analysis (math.NA); Computation (stat.CO)
[616]  arXiv:2404.15667 (replaced) [pdf, other]
Title: The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews
Comments: Accepted to the International Conference on Evaluation and Assessment in Software Engineering (EASE), 2024 edition
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
[617]  arXiv:2404.15678 (replaced) [pdf, other]
Title: Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
[618]  arXiv:2404.15719 (replaced) [pdf, other]
Title: HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[619]  arXiv:2404.15735 (replaced) [pdf, other]
Title: Replacing Cryptopuzzles with Useful Computation in Blockchain Proof-of-Work Protocols
Comments: Submitted to ACM Computing Surveys
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[620]  arXiv:2404.15736 (replaced) [pdf, other]
Title: What Makes Multimodal In-Context Learning Work?
Comments: 20 pages, 16 figures. Accepted to CVPR 2024 Workshop on Prompting in Vision. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[621]  arXiv:2404.15882 (replaced) [pdf, ps, other]
Title: Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Comments: Published as a conference paper at CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
[622]  arXiv:2404.15891 (replaced) [pdf, other]
Title: OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation
Comments: arXiv admin note: text overlap with arXiv:2311.17061 by other authors
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[623]  arXiv:2404.15943 (replaced) [pdf, other]
Title: Decentralized Personalized Federated Learning based on a Conditional Sparse-to-Sparser Scheme
Comments: 15 pages, 9 figures, 3 pages theory
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[624]  arXiv:2404.15967 (replaced) [pdf, other]
Title: Interpretable Clustering with the Distinguishability Criterion
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[625]  arXiv:2404.16012 (replaced) [pdf, other]
Title: GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting
Comments: Project Page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
[ total of 625 entries: 1-625 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2404, contact, help  (Access key information)