1. Basics about big data
1.1 Characteristics of big data:
- volume (terabytes - zettabytes -> really a lot)
- variety (structured, polystructured, unstructured)
- velocity (batch, streaming data)
2. Products/Tools
- NoSQL (Accumulo, Aerospike, Alchemy Database, AllegroGraph, Apache CouchDB, ArrangoDB, Berkeley DB, Cassandra, Clusterpoint, CortexDB, Couchbase, DocumentDB, Druid, Dynamo, FairCom c-treeACE, FoundationDB, Giraph, HBase, HyperDex, InfiniteGraph, Lotus Notes, MarkLogic, MemcacheDB, MongoDB, MUMPS, Neo4J, Oracle NOSQL Database, OrientDB, Qizx, Redis, RethinkDB, Riak, Stardog, Vertica, Virtuoso)
- MapReduce (Apache Hadoop MapReduce, disco, DryadLINQ, MATLAB MapReduce, QtConcurrent, Skynet, Splunk, Stratosphere)
- Storage (S3, Hadoop Distributed File System)
- Servers (EC2, Elastic, Google App Engine)
- Processing (BigSheets, ElasticSearch, R, Splunk, Solr/Lucene, Yahoo! Pipes)
3. Useful links
- In general Wikipedia: https://en.wikipedia.org
- Open source: http://hadoop.apache.org
- Commercial products IBM: https://www-01.ibm.com/software/data/bigdata/
- Commercial products Splunk: http://www.splunk.com
Do you have some interesting articles regarding big data? Please reply :)
Thanx Andreas