内容简介
大数据分析是检视庞大的复杂数据集的过程,这些数据集通常超出了你所拥有的计算能力。R语言作为数据科学的领军编程语言,包含了诸多功能强大的函数,足以解决大数据处理相关的所有问题。
《大数据分析:R语言实现(影印版 英文版)》首先简要叙述了大数据领域及其当前的行业标准.然后介绍了R语言的发展、结构、现实应用和不足之处,接着引入了用于数据管理和转换的主要R函数的修订版。读者会了解至U基于云的大数据解决方案(例如Amazon EC2实例和Amazon RDS,Microsoft Azure及其HDInsight集群)以及R与关系/非关系数据库(如MongoDB和HBase)之间如何建立连接。除此之外,进一步涵盖了大数据工具,如ApacheHadoop、HDFS和MapReduce,还有其他一些R兼容工具,如Apache Spark及其机器学习库Spark MLlib、H2O。
作者简介
Simon Walkowiak,a cognitive neuroscientist and a managing director of Mind Project Ltd - a Big Data and Predictive Analytics consultancy based in London, United Kingdom. As a former data curator at the UK Data Service (UKDS, University of Essex) - European largest socio-economic data repository, Simon has an extensive experience in processing and managing large-scale datasets such as censuses, sensor and smart meter data, telecommunication data and well-known governmental and social surveys such as the British Social Attitudes survey, Labour Force surveys, Understanding Society, National Travel survey, and many other socio-economic datasets collected and deposited by Eurostat, World Bank, Office for National Statistics, Department of Transport, NatyCen and International Energy Agency, to mention just a few. Simon has delivered numerous data science and R training courses at public institutions and international comparniues. He has also taught a course in Big Data Methods in R at major UK universities and at the prestigious Big Data and Analyhcs Summer School organized by the Institute of Analytics and Data Saence (IADS),
内页插图
目录
Preface
Chapter 1: The Era of Big Data
Big Data - The monster re-defined
Big Data toolbox - dealing with the giant
Hadoop - the elephant in the room
Databases
Hadoop Spark-ed up
R- The unsung Big Data hero
Summary
Chapter 2: Introduction to R Programming Language and Statistical Environment
Learning R
Revisiting R basics
Getting R and RStudio ready
Setting the URLs to R repositories
R data structures
Vectors
Scalars
Matrices
Arrays
Data frames
Lists
Exporting R data objects
Applied data science with R
Importing data from different formats
Exploratory Data Analysis
Data aggregations and contingency tables
Hypothesis testing and statistical inference
Tests of differences
Independent t-test example (with power and effect size estimates)
ANOVA example
Tests of relationships
An example of Pearson's r correlations
Multiple regression example
Data visualization packages
Summary
Chapter 3: Unleashing the Power of R from Within
Traditional limitations of R
Out-of-memory data
Processing speed
To the memory limits and beyond
Data transformations and aggregations with the ff and ffbase packages
Generalized linear models with the ff and ffbase packages
Logistic regression example with ffbase and biglm
Expanding memory with the bigmemory package
Parallel R
From bigmemory to faster computations
An apply() example with the big.matrix object
A for() loop example with the ffdf object
Using apply() and for() loop examples on a data.frame
A parallel package example
A foreach package example
The future of parallel processing in R
Utilizing Graphics Processing Units with R
Multi-threading with Microsoft R Open distribution
Parallel machine learning with H20 and R
Boosting R performance with the data.table package and other tools
Fast data import and manipulation with the data.table package
Data import with data.table
Lightning-fast subsets and aggregations on data.table
Chaining, more complex aggregations, and pivot tables with data.table
Writing better R code
Summary
Chapter 4: Hadoop and MapReduce Framework for R
Hadoop architecture
Hadoop Distributed File System
MapReduce framework
A simple MapReduce word count example
Other Hadoop native tools
Learning Hadoop
A single-node Hadoop in Cloud
Deploying Hortonworks Sandbox on Azure
A word count example in Hadoop using Java
A word count example in Hadoop using the R language
RStudio Server on a Linux RedHat/CentOS virtual machine
Installing and configuring RHadoop packages
HDFS management and MapReduce in R - a word count example
HDInsight - a multi-node Hadoop cluster on Azure
Creating your first HDInsight cluster
Creating a new Resource Group
Deploying a Virtual Network
Creating a Network Security Group
Setting up and configuring an HDInsight cluster
Starting the cluster and exploring Ambari
Connecting to the HDInsight cluster and installing RStudio Server
Adding a new inbound security rule for port 8787
Editing the Virtual Network's public IP address for the head node
Smart energy meter readings analysis example - using R on HDInsight cluster
Summary
Chapter 5: R with Relational Database Management Systems (RDBMSs)
Relational Database Management Systems (RDBMSs)
A short overview of used RDBMSs
Structured Query Language (SQL)
SQLite with R
Preparing and importing data into a local SQLite database
Connecting to SQLite from RStudio
MariaDB with R on a Amazon EC2 instance
Preparing the EC2 instance and RStudio Server for use
Preparing MariaDB and data for use
Working with MariaDB from RStudio
PostgreSQL with R on Amazon RDS
Launching an Amazon RDS database instance
Preparing and uploading data to Amazon RDS
Remotely querying PostgreSQL on Amazon RDS from RStudio
Summary
Chapter 6: R with Non-Relational (NoSQL) Databases
Introduction to NoSQL databases
Review of leading non-relational databases
MongoDB with R
Introduction to MongoDB
MongoDB data models
Installing MongoDB with R on Amazon EC2
Processing Big Data using MongoDB with R
Importing data into MongoDB and basic MongoDB commands
MongoDB with R using the rmongodb package
MongoDB with R using the RMongo package
MongoDB with R using the mongolite package
HBase with R
Azure HDInsight with HBase and RStudio Server
Importing the data to HDFS and HBase
Reading and querying HBase using the rhbase package
Summary
Chapter 7: Faster than Hadoop - Spark with R
Spark for Big Data analytics
Spark with R on a multi-node HDInsight cluster
Launching HDInsight with Spark and R/RStudio
Reading the data into HDFS and Hive
Getting the data into HDFS
Importing data from HDFS to Hive
Bay Area Bike Share analysis using SparkR
Summary
Chapter 8: Machine Learning Methods for Big Data in R
What is machine learning?
Supervised and unsupervised machine learning methods
Classification and clustering algorithms
Machine learning methods with R
Big Data machine learning tools
GLM example with Spark and R on the HDInsight cluster
Preparing the Spark cluster and reading the data from HDFS
Logistic regression in Spark with R
Naive Bayes with H20 on Hadoop with R
Running an H2O instance on Hadoop with R
Reading and exploring the data in H2O
Naive Bayes on H2O with R
Neural Networks with H2O on Hadoop with R
How do Neural Networks work?
Running Deep Learning models on H20
Summary
Chapter 9: The Future of R - Big, Fast, and Smart Data
The current state of Big Data analytics with R
Out-of-memory data on a single machine
Faster data processing with R
Hadoop with R
Spark with R
R with databases
Machine learning with R
The future of R
Big Data
Fast data
Smart data
Where to go next
Summary
Index
大数据分析:R语言实现(影印版 英文版) [Big data analytics with R] epub pdf mobi txt 电子书 下载 2025
大数据分析:R语言实现(影印版 英文版) [Big data analytics with R] 下载 epub mobi pdf txt 电子书 2025
大数据分析:R语言实现(影印版 英文版) [Big data analytics with R] mobi pdf epub txt 电子书 下载 2025
大数据分析:R语言实现(影印版 英文版) [Big data analytics with R] epub pdf mobi txt 电子书 下载 2025