Sensor data management in the cloud

Data storage, data ingestion, and data retrieval

    Research output: Contribution to journalArticleResearchpeer-review

    6 Citations (Scopus)

    Abstract

    Sensors are widely used in the field of manufacturing, railways, aerospace, cars, medicines, robotics, and many other aspects of our everyday life. There is an increasing need to capture, store, and analyse the dynamic semi-structured data from those sensors. A similar growth of semi-structured data in the modern web has led to the creation of NoSQL data stores for scalability, availability, and performance, whereas large-scale data processing frameworks for parallel analysis. NoSQL data store such as MongoDB and data processing framework such as Apache Hadoop has been studied for scientific data analysis. However, there has been no study on MongoDB with Apache Spark, and there is a limited understanding of how sensor data management can benefit from these technologies, specifically for ingesting high-velocity sensor data and parallel retrieval of high volume data. In this paper, we evaluate the performance of MongoDB sharding and no-sharding databases with Apache Spark, to identify the right software environment for sensor data management.

    Original languageEnglish
    Article numbere4354
    Pages (from-to)1-10
    Number of pages10
    JournalConcurrency Computation
    Volume30
    Issue number1
    DOIs
    Publication statusPublished - 2018

    Keywords

    • Apache Spark
    • data ingestion
    • data retrieval
    • data storage
    • MongoDB
    • sensor data management

    Cite this

    @article{d80fea1a80be45de800a8c30da689b9d,
    title = "Sensor data management in the cloud: Data storage, data ingestion, and data retrieval",
    abstract = "Sensors are widely used in the field of manufacturing, railways, aerospace, cars, medicines, robotics, and many other aspects of our everyday life. There is an increasing need to capture, store, and analyse the dynamic semi-structured data from those sensors. A similar growth of semi-structured data in the modern web has led to the creation of NoSQL data stores for scalability, availability, and performance, whereas large-scale data processing frameworks for parallel analysis. NoSQL data store such as MongoDB and data processing framework such as Apache Hadoop has been studied for scientific data analysis. However, there has been no study on MongoDB with Apache Spark, and there is a limited understanding of how sensor data management can benefit from these technologies, specifically for ingesting high-velocity sensor data and parallel retrieval of high volume data. In this paper, we evaluate the performance of MongoDB sharding and no-sharding databases with Apache Spark, to identify the right software environment for sensor data management.",
    keywords = "Apache Spark, data ingestion, data retrieval, data storage, MongoDB, sensor data management",
    author = "Prajwol Sangat and Maria Indrawan-Santiago and David Taniar",
    year = "2018",
    doi = "10.1002/cpe.4354",
    language = "English",
    volume = "30",
    pages = "1--10",
    journal = "Concurrency and Computation-Practice & Experience",
    issn = "1532-0626",
    publisher = "Wiley-Blackwell",
    number = "1",

    }

    Sensor data management in the cloud : Data storage, data ingestion, and data retrieval. / Sangat, Prajwol; Indrawan-Santiago, Maria; Taniar, David.

    In: Concurrency Computation, Vol. 30, No. 1, e4354, 2018, p. 1-10.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Sensor data management in the cloud

    T2 - Data storage, data ingestion, and data retrieval

    AU - Sangat, Prajwol

    AU - Indrawan-Santiago, Maria

    AU - Taniar, David

    PY - 2018

    Y1 - 2018

    N2 - Sensors are widely used in the field of manufacturing, railways, aerospace, cars, medicines, robotics, and many other aspects of our everyday life. There is an increasing need to capture, store, and analyse the dynamic semi-structured data from those sensors. A similar growth of semi-structured data in the modern web has led to the creation of NoSQL data stores for scalability, availability, and performance, whereas large-scale data processing frameworks for parallel analysis. NoSQL data store such as MongoDB and data processing framework such as Apache Hadoop has been studied for scientific data analysis. However, there has been no study on MongoDB with Apache Spark, and there is a limited understanding of how sensor data management can benefit from these technologies, specifically for ingesting high-velocity sensor data and parallel retrieval of high volume data. In this paper, we evaluate the performance of MongoDB sharding and no-sharding databases with Apache Spark, to identify the right software environment for sensor data management.

    AB - Sensors are widely used in the field of manufacturing, railways, aerospace, cars, medicines, robotics, and many other aspects of our everyday life. There is an increasing need to capture, store, and analyse the dynamic semi-structured data from those sensors. A similar growth of semi-structured data in the modern web has led to the creation of NoSQL data stores for scalability, availability, and performance, whereas large-scale data processing frameworks for parallel analysis. NoSQL data store such as MongoDB and data processing framework such as Apache Hadoop has been studied for scientific data analysis. However, there has been no study on MongoDB with Apache Spark, and there is a limited understanding of how sensor data management can benefit from these technologies, specifically for ingesting high-velocity sensor data and parallel retrieval of high volume data. In this paper, we evaluate the performance of MongoDB sharding and no-sharding databases with Apache Spark, to identify the right software environment for sensor data management.

    KW - Apache Spark

    KW - data ingestion

    KW - data retrieval

    KW - data storage

    KW - MongoDB

    KW - sensor data management

    UR - http://www.scopus.com/inward/record.url?scp=85037151941&partnerID=8YFLogxK

    U2 - 10.1002/cpe.4354

    DO - 10.1002/cpe.4354

    M3 - Article

    VL - 30

    SP - 1

    EP - 10

    JO - Concurrency and Computation-Practice & Experience

    JF - Concurrency and Computation-Practice & Experience

    SN - 1532-0626

    IS - 1

    M1 - e4354

    ER -