TY - JOUR
T1 - Sensor data management in the cloud
T2 - Data storage, data ingestion, and data retrieval
AU - Sangat, Prajwol
AU - Indrawan-Santiago, Maria
AU - Taniar, David
PY - 2018
Y1 - 2018
N2 - Sensors are widely used in the field of manufacturing, railways, aerospace, cars, medicines, robotics, and many other aspects of our everyday life. There is an increasing need to capture, store, and analyse the dynamic semi-structured data from those sensors. A similar growth of semi-structured data in the modern web has led to the creation of NoSQL data stores for scalability, availability, and performance, whereas large-scale data processing frameworks for parallel analysis. NoSQL data store such as MongoDB and data processing framework such as Apache Hadoop has been studied for scientific data analysis. However, there has been no study on MongoDB with Apache Spark, and there is a limited understanding of how sensor data management can benefit from these technologies, specifically for ingesting high-velocity sensor data and parallel retrieval of high volume data. In this paper, we evaluate the performance of MongoDB sharding and no-sharding databases with Apache Spark, to identify the right software environment for sensor data management.
AB - Sensors are widely used in the field of manufacturing, railways, aerospace, cars, medicines, robotics, and many other aspects of our everyday life. There is an increasing need to capture, store, and analyse the dynamic semi-structured data from those sensors. A similar growth of semi-structured data in the modern web has led to the creation of NoSQL data stores for scalability, availability, and performance, whereas large-scale data processing frameworks for parallel analysis. NoSQL data store such as MongoDB and data processing framework such as Apache Hadoop has been studied for scientific data analysis. However, there has been no study on MongoDB with Apache Spark, and there is a limited understanding of how sensor data management can benefit from these technologies, specifically for ingesting high-velocity sensor data and parallel retrieval of high volume data. In this paper, we evaluate the performance of MongoDB sharding and no-sharding databases with Apache Spark, to identify the right software environment for sensor data management.
KW - Apache Spark
KW - data ingestion
KW - data retrieval
KW - data storage
KW - MongoDB
KW - sensor data management
UR - http://www.scopus.com/inward/record.url?scp=85037151941&partnerID=8YFLogxK
U2 - 10.1002/cpe.4354
DO - 10.1002/cpe.4354
M3 - Article
AN - SCOPUS:85037151941
VL - 30
SP - 1
EP - 10
JO - Concurrency and Computation-Practice & Experience
JF - Concurrency and Computation-Practice & Experience
SN - 1532-0626
IS - 1
M1 - e4354
ER -