Wednesday, March 28, 2012

Big Data

What is Big Data?

Previously, we have always thought of data as set of relational databases neatly packed in tables. 

Today, we have data coming from the internet, social networking sites, online shopping sites, mobile devices, cell phones, text messaging just to name a few.  Where does this data go?  How can it be stored, indexed, packaged, queried and how can any one technology keep up with the volume?  All of these questions have lead to the term "Big Data".

Big Data is defined by Wikipedia as " data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set"

What are "No SQL" databases? Why are they important?

No SQL databases are databases that are structured much differently than the traditional rational databases. Key components of a "No SQL" database include:

  • Easy to use in conventional load-balanced clusters
  • Persistent data (not just caches)
  • Scale to available memory
  • Have no fixed schemas and allow schema migration without downtime
  • Have individual query systems rather than using a standard query language

Three key drivers have created an interest in Big Data and No SQL.
  • Data from Social Networking and Web 2.0 sites
  • Data changes over time and many data models don't evolve to keep pace with the changes in data
  • No SQL technology is becoming a comodity therefore everyone can get it and use it relatively easily.

The following table and website, compare the different companies that are offering No SQL technologies, they include Amazon Web Services, Google Big Table and mongo DB.


What is Hadoop?
A current leader in No SQL technology of handling Big Dat is The Apache™ Hadoop™ project which has developed open-source software for reliable, scalable, distributed computing.

Hadoop's software library allows for the distributed processing of large data sets across clusters of computers using a simple programming model.

What is Pig?

Apache Pig is a platform for analyzing large data sets. Pig's language, Pig Latin, lets you specify a sequence of data transformations such as merging data sets, filtering them, and applying functions to records or groups of records.

What is Hive?


Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.

Big Data is a growing problem and a growing opportunity.  With so much data being transmitted through so many different means, companies are constantly looking for ways to manage the increasing volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources) of data.






Wednesday, March 21, 2012

Lean Primer - Summary and Review

Lean Primer by Craig Larman and Bas Vodde


This article describes "lean thinking" and the "Toyota Way".  Lean thinking is a proven system that applies product development and production used by Toyota and other companies.


The best way to describe lean thinking is to view the following graphic:


The foundation of the house is management and its long term philosophy.  The "pillars" are two important priorities for Toyota, being "Respect for People" and "Continuous Improvement".  The "heart" is a combination of Product Development and the 14 Principles.  And finally on the top is the roof which represents the successes of this process.


The author sees "lean" as a broad system that spans all groups and functions of a company.  Everything from product development, sales, production, IT and HR.   The chairman of Toyota believes that these tools must be practiced consistently and used every day in order to be effective.


I think this type of lean thinking can be extremely effective if implemented correctly but I feel this implementation would be very difficult.


It feels very rigid to me and doesn't allow for flexibility in thought, management style, skill set or philosophy.  An organization would have to be very strict and rigid in its rules and regulations for this type of "lean" thinking to work.  If it does work however, I think it can be very successful because every part of the organization is working for the same goals and end results.

What is BPM? A Summary and Review

Summary of the Article, What is Business Process Management by Michael Hammer


Michael Hammer writes, "Business Process Management (BPM) is a comprehensive system for managing and transforming organizational operations, based on what is arguably the first set of new ideas about organizational performance since the Industrial Revolution."


1.  Origins of BPM
Two primary schools of thought led to today's BPM systems.  First is the work of Shewhart and Deming on statistical process control which led to today's more sophisticated Six Sigma.  These systems rely on the use of performance metrics to determine whether work is performed satisfactorily or not, reliant on hard data rather than opinion.  Limitations for this process include, its definition of a process leads to an organization to have hundreds of processes that may or may not have significance to the enterprise as a whole, yet all have to be measured and evaluated.


Another school of thought was the authors, Michael Hammer called Reengineering.  Reengineering redefined process as being end to end work across an enterprise that creates customer value and focusing only on the meaningful processes.


2.  The Process Management Cycle
Over the last decade, these to processes have merged to give us Business Process Management.




The diagram above shows the process management cycle beginning at the bottom.


Failure of the process can lie in the failure of the design and/or execution.  Once the root of the problem is found, it is easy to fix.


The process is built on the premise that through deliberate management of the end to end process, customer value is created.  BPM is customer-centered.  Customer, results and process for a triangle that organizations should give attention to.


3.  The Payoffs of Process Management
Enterprise benefits from consistency, cost, speed, quality and service which in turn improves customer satisfaction.
Recent Examples of companies that have benefited from BPM include:
-Consumer goods manufacturers who were able to reduce inventory by 25% yet 'out of stock' situations declined by 50%
-Computer maker reduced time to market by 75%, development costs by 45% and increased customer satisfaction by 25%.


4.  Enablers of the Process
Companies need to have 5 critical enablers in place in order for BPM to work effectively
     1.  Process design
     2.  Process metrics - targets need to be set and performance measured
     3.  Process Performers - people with certain skill sets
     4.  Process Infrastructure - HR and IT departments that work together
     5.  Process Owner - senior managers with authority to make sure processes are carried out throughout the organization.


5.  Organizational Capabilities
Four critical capabilities are needed to ensure success with the processes.
     1.  Leadership - passionate senior leadership to support efforts.
     2.  Culture - the culture of an organization should support the process by people at all levels will be involved.
     3.  Governance - process owners, executive leaders and other senior managers.
     4.  Expertise - companies need people with expertise in process design and implementation


6.  The Principles of Process Management
     1.  All work is NOT all process work - there must be a balance between development process and creativity.  Processes should not be misinterpreted as routinization or automation.
     2.  Any process is NOT better than no process - a process must be well defined.
     3.  A good process is better than a bad process - a bad process must be replaced
     4.  One process version is better than many - standardization across enterprise can allow companies to give better support services and allows redeployment of staff to other parts of of the business.
     5.  Even a good process must be performed effectively 
     6.  Even a good process can be made better
     7.  Every good process eventually becomes a bad process - no process can stay effective forever and will need to be replaced.


7.  The EPM as a Management Tool and BPMS
EPM is Enterprise Process Model and it is a graphical representation of an enterprise's process.






The graphic above is an example of an Enterprise Process Model


8.  Frontiers of BPM
Despite its widespread adoption, BPM is still in its infancy.  Challenges for the future include:  Management structure and responsibility, IT support, Interenterprise process, Standards, Process and Strategy and Industry Structure.


There is much work to be done in the area of BPM but it is definitely "the wave of the present and we are in the Age of Process."


Review by Farah:
I think Hammer is a leader in his field and his analysis made so much sense to me.  


I think the biggest hurdle in the broad acceptance and implementation of BPM will be finding enough qualified people throughout an organization to get across the board buy in:  Having senior executives who are willing to implement and monitor such a process and having skilled IT and HR people willing to work with such a process.


I am a huge believer in proper training and implementation.  I have seen so many great projects, great ideas and processes put into place,  then very poor training and lack of documentation occurs, the ball is dropped and the project never really works as intended.  Members of an organization must have on going training at every level and people should be in place to tweak, retweak and make sure the process is implemented effectively on an ongoing basis.