Sampling of highly correlated data for polynomial regression and model discovery

Grace W Rumantir, Chris S Wallace

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    5 Citations (Scopus)


    The usual way of conducting empirical comparisons among competing polynomial model selection criteria is by generating artificial data from created true models with specified link weights. The robustness of each model selection criterion is then judged by its ability to recover the true model from its sample data sets with varying sizes and degrees of noise.

    If we have a set of multivariate real data and have empirically found a polynomial regression model that is so far seen as the right model represented by the data, we would like to be able to replicate the multivariate data artificially to enable us to run multiple experiments to achieve two objectives. First, to see if the model selection criteria can recover the model that is seen to be the right model. Second, to find out the minimum sample size required to recover the right model.

    This paper proposes a methodology to replicate real multivariate data using its covariance matrix and a polynomial regression model seen as the right model represented by the data. The sample data sets generated are then used for model discovery experiments.
    Original languageEnglish
    Title of host publicationAdvances in Intelligent Data Analysis
    Subtitle of host publication4th International Conference, IDA 2001 Cascais, Portugal, September 13-15, 2001 Proceedings
    EditorsFrank Hoffmann, David J. Hand, Niall Adams, Douglas Fisher, Gabriela Guimaraes
    Place of PublicationBerlin Germany
    Number of pages8
    ISBN (Print)3540425810
    Publication statusPublished - 2001
    Event4th International Conference on Intelligent Data Analysis, IDA 2001 - Cascais, Portugal
    Duration: 13 Sept 200115 Sept 2001

    Publication series

    NameLecture Notes in Computer Science
    ISSN (Print)0302-9743


    Conference4th International Conference on Intelligent Data Analysis, IDA 2001

    Cite this