Next step is to detect the number of students in the class. This step is important to reduce the reduce the domain of looking for raised hands as only the students can raise their hands and if we know the face location of each student it is easy to estimate the location of hand. Then some system can keep looking for hand raising event in those locations (also called Regions of Interest i.e. roi) which simplifies the problem a little.
In research community, face detection is assumed to be a solved problem and very good algorithms are available which can used directly which yields really good results in any environment. Most classical approach is based on Haar wavelets by Viola and Jones which is marked as one of the most cited research work in Computer vision and machine learning conferences. Matlab also provides an API for face detection which used the same algorithm and I just used it which gives really good results with some false positives now and then.
faceDetector = vision.CascadeObjectDetector();
bbox = step(faceDetector, image);