Tuesday, August 20, 2013

What is Hadoop anyway?

Hadoop will change the way businesses think about storage, processing and the value of ‘big’ data.

Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF). Hadoop enables the user to extract valuable business insight from massive amounts of structured and unstructured data quickly and cost-effectively through three main functions:

Processing – MapReduce. Computation in Hadoop is based on the MapReduce paradigm that distributes tasks across a cluster of coordinated “nodes.” 

Storage – HDFS. Storage is accomplished with the Hadoop Distributed File System (HDFS) – a reliable file system that allows large volumes of data to be stored and accessed across large clusters of commodity servers.

Resource Management – YARN. Coming in Hadoop 2.0, YARN performs a resource management function further increasing efficiency and extends MapReduce capabilities by supporting non-MapReduce workloads such as Graph, Steaming, In-memory, MPI processing and more. 

Hadoop is designed to scale up or down without system interruption and runs on commodity hardware making the capture and processing of big data economically viable for the enterprise. 

“By 2015, I believe that 50% of the world’s data will be stored and analyzed by Apache Hadoop.” 

Friday, August 2, 2013

How to do Speech Analytics from Bigdata?

Audio Analytics or Speech Analytics, you need to convert all files into the text format. Converting speech to text is an important step towards converting unstructured information to structured information. Once text has been output, then we can extract information through it. (people names, company names, product names, relations among them etc.)

Stay tuned for more updates on Speech/Audio Analytics.