Model Selection
497 papers with code • 0 benchmarks • 1 datasets
Given a set of candidate models, the goal of Model Selection is to select the model that best approximates the observed data and captures its underlying regularities. Model Selection criteria are defined such that they strike a balance between the goodness of fit, and the generalizability or complexity of the models.
Benchmarks
These leaderboards are used to track progress in Model Selection
Libraries
Use these libraries to find Model Selection models and implementationsLatest papers
Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation
Synthetic users are cost-effective proxies for real users in the evaluation of conversational recommender systems.
Detection of Unobserved Common Causes based on NML Code in Discrete, Mixed, and Continuous Variables
Causal discovery in the presence of unobserved common causes from observational data only is a crucial but challenging problem.
On the Model-Agnostic Multi-Source-Free Unsupervised Domain Adaptation
Specifically, we first conduct source model selection based on the proposed selection principles.
Defining Expertise: Applications to Treatment Effect Estimation
Actions of an expert thus naturally encode part of their domain knowledge, and can help make inferences within the same domain: Knowing doctors try to prescribe the best treatment for their patients, we can tell treatments prescribed more frequently are likely to be more effective.
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations to construct evolving instances testing LLMs against diverse queries, data noise and probing their problem-solving sub-abilities.
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents
Large language models (LLMs) have recently received considerable attention as alternative solutions for task planning.
Model Assessment and Selection under Temporal Distribution Shift
We investigate model assessment and selection in a changing environment, by synthesizing datasets from both the current time period and historical epochs.
MT-HCCAR: Multi-Task Deep Learning with Hierarchical Classification and Attention-based Regression for Cloud Property Retrieval
In response, this paper introduces MT-HCCAR, an end-to-end deep learning model employing multi-task learning to simultaneously tackle cloud masking, cloud phase retrieval (classification tasks), and COT prediction (a regression task).
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning
Despite this, their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.
Valid causal inference with unobserved confounding in high-dimensional settings
We propose uncertainty intervals which allow for unobserved confounding, and show that the resulting inference is valid when the amount of unobserved confounding is small relative to the sample size; the latter is formalized in terms of convergence rates.