Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-0430104-155106

Document Typethesis
Author NameZhong, Xiao
TitleA Study of Several Statistical Methods for Classification with Application to Microbial Source Tracking
DepartmentMathematical Sciences
  • Jayson D. Wilbur, Advisor
  • Keywords
  • classification
  • k-nearest-neighbor (k-n-n)
  • neural networks
  • linear discriminant analysis (LDA)
  • support vector machines
  • microbial source tracking (MST)
  • quadratic discriminant analysis (QDA)
  • logistic regression
  • Date of Presentation/Defense2004-05-05
    Availability unrestricted


    With the advent of computers and the information age, vast amounts of data generated in a great deal of science and industry fields require the statisticians to explore further. In particular, statistical and computational problems in biology and medicine have created a new field of bioinformatics, which is attracting more and more statisticians, computer scientists, and biologists.

    Several procedures have been developed for tracing the source of fecal pollution in water resources based on certain characteristics of certain microorganisms.

    Use of this collection of techniques has been termed microbial source tracking (MST). Most of the current methods for MST are based on patterns of either phenotypic or genotypic variation in indicator organisms. Studies also suggested that patterns of genotypic variation might be more reliable due to their less association with environmental factors than those of phenotypic variation. Among the genotypic methods for source tracking, fingerprinting via rep-PCR is most common. Thus, identifying the specific pollution sources in contaminated waters based on rep-PCR fingerprinting techniques, viewed as a classification problem, has become an increasingly popular research topic in bioinformatics.

    In the project, several statistical methods for classification were studied, including linear discriminant analysis, quadratic discriminant analysis, logistic regression, and $k$-nearest-neighbor rules, neural networks and support vector machine. This project report summaries each of these methods and relevant statistical theory. In addition, an application of these methods to a particular set of MST data is presented and comparisons are made.

  • thesis.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu