UCLouvain
Abstract: We discuss new challenges in the modern Science, created by Artificial Intelligence (AI). Indeed, AI requires a system of new sciences, mainly based on computational models. Their development has already started by the progress in Computational Mathematics. In this new reality, Optimization plays an important role, helping the other fields with finding tractable models and efficient methods, and significantly increasing their predictive power. We support our conclusions by several examples of efficient optimization schemes related to human activity.
The Hong Kong Polytechnic University
Abstract: In this talk, we shall explain why nonsmooth analysis plays a critical role in solving large scale sparse optimization problems. We start with introducing some basic concepts such as Rademacher’ theorem and the Moreau-Yosida regularization for convex functions. Then we talk about semismooth analysis including inverse and implicit functions theorems to demonstrate why nonsmooth systems are indispensable for solving constrained optimization problems and why smooth systems inevitably lead to singularity. Finally, we shall illustrate how we can employ nonsmooth analysis to design highly efficient sparse nonsmooth Secant/Newton methods for solving several important machine learning models including convex clustering, lasso, and exclusive lasso of sparse solutions.
The University of Hong Kong
Abstract: In this talk, we provide a more systematic and principled view about the practice of artificial intelligence in the past decade from the history of the study of intelligence. We argue that the most fundamental objective of intelligence is to learn a compact and structured representation of the sensed world that maximizes information gain, measurable by coding rates of the learned representation. We contend that optimizing this principled objective provides a unifying white-box explanation for almost all past and current practices of artificial intelligence based on deep networks, including CNNs, ResNets, and Transformers. Hence, mathematically interpretable, practically competitive, and semantically meaningful deep networks are now within our reach, see our latest release: https://ma-lab-berkeley.github.io/CRATE/. Furthermore, our study shows that to learn such representation correctly and automatically, additional computational mechanisms are necessary besides deep networks. For intelligence to become autonomous, one needs to integrate fundamental ideas from coding theory, optimization, feedback control, and game theory. This connects us back to the true origin of the study of intelligence 80 years ago. Probably most importantly, this new framework reveals a much broader and brighter future for developing next-generation autonomous intelligent systems that could truly emulate the computational mechanisms of natural intelligence.
Related papers can be found at:
1. https://ma-lab-berkeley.github.io/CRATE/
2. https://jmlr.org/papers/v23/21-0631.html
3. https://www.mdpi.com/1099-4300/24/4/456/htm
AMCS, KAUST
The Pennsylvania State University
Abstract: Deep learning has emerged as a powerful tool with numerous successful applications in big data, notably in tasks such as image classification and natural language processing. Theoretically, such success has often been attributed to the capability of deep neural networks for high-dimensional problems. This talk aims to offer theoretical insights, provide some theoretical understanding, and clarify common misconceptions surrounding this critical issue.
National University of Singapore
Abstract: Recent semiconductor scaling trends continue to support the evolution of silicon systems beyond the inevitable end of technology scaling, growing the deployment of intelligent and connected chips towards the trillion range by the end of the decade. Such evolution vastly outranges any application ever deployed by human beings, and its sustained growth is now fundamentally impeded by batteries as conventional source of energy. From a silicon chip viewpoint, batteries at the trillion scale severely limit advances in cost, form factor, system lifespan and chip availability over time. From a societal perspective, batteries in the trillions threaten economic and environmental sustainability of the underlying scaling trend, and hence its feasibility.
This talk introduces the key ideas and their silicon demonstrations to enable a new breed of always-on silicon systems from sensing, to computing and wireless communications with no battery inside (or any other energy storage). Highly power-scalable systems with adaptation to the highly-fluctuating power profile of energy harvesters is shown to enable next-generation pervasive integrated systems with cost well below 1$, size of few millimeters, long lifetime well beyond the traditional shelf life of batteries, yet at near-100% up-time.
Sensor interfaces, processors and wireless transceivers fitting existing infrastructure (e.g., WiFi, Bluetooth) with power reductions by orders of magnitude are discussed and exemplified by numerous silicon demonstrations from our research group, and their system integration. Ultimately, the technological pathway discussed in this talk supports sustainable growth of applications leveraging large-scale deployments of silicon systems, making our planet smarter. And greener too.
Shenzhen Research Institute of Big Data
Abstract: In this work, we study the event occurrences of individuals interacting in a network. To characterize the dynamic interactions among the individuals, we propose a group network Hawkes process (GNHP) model whose network structure is observed and fixed. In particular, we introduce a latent group structure among individuals to account for the heterogeneous user-specific characteristics. A maximum likelihood approach is proposed to simultaneously cluster individuals in the network and estimate model parameters. A fast EM algorithm is subsequently developed by utilizing the branching representation of the proposed GNHP model. Theoretical properties of the resulting estimators of group memberships and model parameters are investigated under both settings when the number of latent groups G is over-specified or correctly specified. A data-driven criterion that can consistently identify the true G under mild conditions is derived. Extensive simulation studies and an application to a data set collected from Sina Weibo are used to illustrate the effectiveness of the proposed methodology.
The Chinese University of Hong Kong
Abstract: The entropy function plays a central role in information theory. Constraints on the entropy function in the form of inequalities, viz. entropy inequalities (often conditional on certain Markov conditions imposed by the problem under consideration), are indispensable tools for proving converse coding theorems. In this talk, I will give an overview of the development of machine-proving of entropy inequalities for the past 25 years. To start with, I will present a geometrical framework for the entropy function, and explain how an entropy inequality can be formulated, with or without constraints on the entropy function. Among all entropy inequalities, Shannon-type inequalities, namely those implied by the nonnegativity of Shannon’s information measures, are best understood. We will focus on the proving of Shannon-type inequalities, which in fact can be formulated as a linear programming problem. I will discuss ITIP, a software package originally developed for this purpose in the mid-1990s, as well as some of its later variants. In 2014, Tian successfully characterized the rate region of a class of exact-repair regenerating codes by means of a variant of ITIP. This is the first nontrivial converse coding theorem proved by a machine. At the end of the talk, I will discuss some recent progress in speeding up the proving of entropy inequalities.
Academy of Mathematics and System Science, Chinese Academy of Sciences
Abstract: We propose constraint dissolving approaches for optimization problems over a class of Riemannian manifolds. In these proposed approaches, solving a Riemannian optimization problem is transferred into the unconstrained minimization of a constraint dissolving function named CDF. Different from existing exact penalty functions, the exact gradient and Hessian of CDF are easy to compute. We study the theoretical properties of CDF and prove that the original problem and CDF have the same first-order and second-order stationary points, local minimizers, and Łojasiewicz exponents in a neighborhood of the feasible region. Remarkably, the convergence properties of our proposed constraint dissolving approaches can be directly inherited from the existing rich results in unconstrained optimization. Therefore, the proposed constraint dissolving approaches build up short cuts from unconstrained optimization to Riemannian optimization. Several illustrative examples further demonstrate the potential of the proposed approaches.
Academy of Mathematics and System Science, Chinese Academy of Sciencesa
Abstract: One-bit precoding is a promising way to achieve hardware-efficiency in massive MIMO systems and has gained growing research interests in recent years. However, the one-bit nature of the transmit signal poses great challenge to precoding design as well as performance analysis. In this talk, we will present some recent results on one-bit precoding. We will focus on both non-linear and linear-quantized precoding schemes. In particular, for non-linear precoding, we introduce a new negative ℓ1 penalty approach, which is based on an exact penalty model that penalizes the one-bit constraint into the objective with a negative ℓ1-norm term. The negative ℓ1 penalty approach achieves a better trade-off in complexity and symbol error rate (SER) performance than existing approaches. For linear-quantized precoding, we give an aysmptotic performance analysis for a wide class of precoders and derive the optimal precoder within the considered class. Different from existing Bussgang-decomposition-based analyzes, our analytical framework is based on random matrix theory (RMT), which is more rigorous and can be extended to more general cases.
Academy of Mathematics and System Science, Chinese Academy of Sciences
Abstract: In this talk we study high-order unfitted finite element methods on Cartesian meshes with hanging nodes for elliptic interface problems, which release the work of body-fitted mesh generation and provide a natural way to design high-order methods without resorting to nonlinear element transforms. We introduce new concepts of large element and interface deviation to solve the small cut cell problem of unfitted finite element methods. We construct a reliable algorithm to merge small interface elements with their surrounding elements to automatically generate the finite element mesh whose elements are large with respect to both domains. We show novel hp-domain inverse estimates which allow us to prove the stability of the finite element method under practical interface resolving mesh conditions and prove hp a priori and a posteriori error estimates. We propose new basis functions for the interface elements to control the growth of the condition number of the stiffness matrix in terms of the finite element approximation order, the number of elements of the mesh, and the interface deviation. Numerical examples are presented to illustrate the competitive performance of the method. This talk is based on joint works with Ke Li, Yong Liu and Xueshuang Xiang.
CEMSE, KAUST
Abstract: Numerical simulation has become one of the major topics in Computational Science. To promote modeling and simulation of complex problems new strategies are needed allowing for the solution of large, complex model systems. Crucial issues for such strategies are reliability, efficiency, robustness, usability, and versatility.
After discussing the needs of large-scale simulation we point out basic simulation strategies such as adaptivity, parallelism and multigrid solvers. To allow adaptive, parallel computations the load balancing problem for dynamically chaniging grids has to be solved efficiently by fast heuristics. These strategies are combined in the simulation system UG (“Unstructured Grids”) being presented in the following.
In the second part of the seminar we show the performance and efficiency of this strategy in various applications. In particular, the application and benefit of parallel adaptive multigrid methods to modelling drug permeation through human skin is shown in detail.
Shenzhen Research Institute of Big Data
Abstract: With large language models, we have transformed the way we process languages. Is large language model getting closer to solving linguistic problems? In this talk, we will discuss the development of large language models from both historical and theoretical perspectives to answer the question. We will also report the latest development of GPT research in CUHK Shenzhen and SRIBD.
Fudan University
Abstract: The field of natural language processing has witnessed breakthrough developments in recent years, entering the era of large language models. Various natural language processing tasks are increasingly unified under the generative paradigm, displaying strong universality and significantly enhancing the performance of natural language processing tasks. However, large language models not only pose engineering challenges but also bring new scientific challenges, including issues pertaining to model architecture, Reasoning ability, hallucinations, and explainability. This report mainly introduces the primary scientific challenges of large language models and offers a perspective on future research.
Frankfurt Institute for Advanced Studies (FIAS)
Abstract: In the past, fundamental research at large-scale research facilities has often served as catalyst for new developments in information processing (just take the internet as an example). In my talk, I will introduce the new strategy of the German communities, doing fundamental research on the Universe and Matter, on how to enable new methods of information technology and big data analytics in their research fields. These strategies will be important for future large-scale research facilities not only in Germany and Europe (e.g. CERN) but also in China (e.g. HIAF which is currently under construction in Guangdong province).
This includes, explaining the basic ideas behind the research and the challenges with respect to Big Data, AI and computational modelling, and highlighting fundamental differences when it comes to the requirements of fundamental research.
In this context, I will highlight some results from a multidisciplinary approach at the FIAS and the Xidian-FIAS Joint Research Centre, to apply new methods of ML/DL to similar problems in different fields of research. Such multidisciplinary approaches may be a way forward to bring the newest developments in information science to the fundamental research communities and allow cross-fertilization.
The Hong Kong University of Science and Technology
Abstract: Application-based taxi and car service e-hailing systems have revolutionized urban mobility by providing on-demand ride services that are timely and convenient. The integration of mathematics, economics, and artificial intelligence is crucial for the development of efficient and sustainable on-demand mobility services, which ultimately benefit customers. This talk will explore the latest developments and research issues in ride-sourcing markets, including demand forecasting, surge-pricing, matching, pricing and ride-pooling, optimal resource allocation, and the impact of ride-pooling on traffic congestion. Additionally, we will discuss topics such as competition, third-party platform-integration, Pareto-efficient market regulations, and the analysis of human mobility and network property using big car trajectory data.
CEMSE, KAUST
Abstract: The growing interest in Internet of Thing (IoT) and mobile Artificial Intelligence applications is pushing the investigation on Deep Neural Networks (DNNs) that can operate at the edge by running on low-resources/energy platforms. This has lead to the development of a plethora of machine learning techniques at hardware, algorithm and software level capable of performing on-device sensor data analytics at extremely low power (typically in the mW range and below), and which broadly defines the field of Tiny Machine Learning (TinyML). TinyML offers several notable advantages, including low latency since algorithms run on edge devices, reducing the need for data transfer to the cloud. It also minimizes bandwidth requirements as little to no connectivity is necessary for inference, enhancing data privacy as data is not stored on external servers; instead, models operate at the edge.
To make sure that DNNs resulting from the training phase may fit a low resource platform, several pruning techniques have been proposed in the literature. They aim to reduce the number of interconnections – and consequently the size, and the corresponding computing and storage requirements – of a DNN relying on classic Multiply-and-ACcumulate (MAC) neurons. In this talk, we first review some pruning techniques highlighting their pros and cons. Then, we introduce a novel neurons structure based on a Multiply-And-Max/min (MAM) map-reduce paradigm, and we show that by exploiting such a new paradigm it is possible to build naturally and aggressively prunable DNN layers, with a negligible loss in performance. In fact, this novel structure allows a greater interconnection sparsity when compared to classic MAC based DNN layers. Moreover, most of the already existing state-of-the-art pruning techniques can be used with MAM layers with little to no changes. We present results using AlexNet, VGG16 and ViT-B/16 using either CIFAR-10, CIFAR-100 or ImagenNet-1K as dataset and show the clear advantages attainable when MAM neurons are exploited.
National Graduate Institute for Policy Studies
Abstract: Semidefinite Programming (SDP) is a unique extension of Linear Programming to symmetric matrices, and is used broadly as a powerful modeling tool in various areas in mathematical/data/computer sciences and engineering. While the most of the algorithms for SDP assume the existence of positive definite feasible solutions on both primal and dual, this assumption may not be necessarily satisfied in general. Such “bad” SDPs are called singular. There are many singular SDPs in applications where a positive definite feasible solution is ensured to exist only on one side (strong duality hold in this case), and some problems even have finite/infinite nonzero duality gaps. In this talk, we focus on singular SDPs with possibly nonzero duality gaps, and demonstrate that there exists a hidden continuity structure between primal and dual through perturbation. The theory is unique in that it is a perturbation analysis where primal and dual are perturbed simultaneously, and leads to a relatively surprising consequence that the polynomial-time interior-point algorithm is able to generate sequences converging to a value between primal and dual optimal solutions to the original problems even under the existence of nonzero duality gap.
HTW Berlin
Zuse Institute Berlin
Abstract: The presence of floating-point roundoff errors compromises the results of virtually all fast mixed-integer programming solvers available today. In this talk we present recent advances in our endeavour to craft a performant mixed-integer optimizer that is not only free from roundoff errors, but also produces certificates of optimality that can be verified independently of the solving process. We provide an overview of the entire framework, which is an extension of the academic solver SCIP. On the practical side, we focus on the safe generation of Gomory mixed-integer cuts via mixed-integer rounding. On the theoretical side, we present a new proof system for certifying the correctness of optimality-based derivations produced by a broad range of solving techniques. Consistency across all decisions that affect the feasible region is achieved by a pair of transitive relations on the set of solutions, which relies on the newly introduced notion of consistent branching trees. The resulting framework offers practical solutions to enhance the trust in integer programming as a methodology for applications where reliability and correctness are key.
Shenzhen Research Institute of Big Data
Abstract: Data, model and algorithm are three key pillars of operations research. Being able to do any of this well can give a huge advantage to solving real world problems. I will first showcase some of the things we have done in solver development arena. I will then use examples from the airline and from the pharmaceutical logistics industry to highlight the importance of combining all these technologies.