$ perl bin/createsamples.pl positives.txt negatives.txt\ 
    samples 1154 "opencv_createsamples -bgcolor 0 -bgthresh 0\ 
    -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 255 -w 183 -h 212" 
  • Download opencv-2.4.8 source code from here
    • $ cp src/mergevec.cpp ../opencv-2.4.8/apps/haartraining/
    • $ cd ../opencv-2.4.8/apps/haartraining/
    g++ `pkg-config --libs --cflags opencv` -I. -o mergevec mergevec.cpp\ 
    cvboost.cpp cvcommon.cpp cvsamples.cpp cvhaarclassifier.cpp\ 
    cvhaartraining.cpp -lopencv_core -lopencv_calib3d\ 
    -lopencv_imgproc -lopencv_highgui -lopencv_objdetect    
  • Put executable mergevec (my mergevec file) in opencv-haar-classifier-training
    • $ cp mergevec ../../../opencv-haar-classifier-training/
    • $ cd ../../../opencv-haar-classifier-training/
    • $ find ./samples -name '*.vec' > samples.txt
    • $ ./mergevec samples.txt samples.vec
  • Train the classifier (It takes a couple of days (O_o))
$ opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt\ 
    -numStages 12 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 1154 -numNeg 15\ 
    -w 183 -h 212 -mode ALL -precalcValBufSize 1024 -precalcIdxBufSize 1024
Slope One, non-trivial item-based and Rating-Based CF Algorithm

  • References
  • Previous Solution
    • using linear regression f(x) = ax+b, leading to severe overfitting
  • Alternative Solution
    • learn a simpler predictor (called slope one) , f(x) = x+b
  • Examples
  • Features
    • subtract the rating of the two items
    • For each user, predict the <item, rating> pairs of those items this user has not rated.
    • predict another user’s rating of those items
    • support both online queries and dynamic updates
    • reduces storage requirements and latency
    • scalable with respect to the number of users
    • deprecated in Mahout 0.8
  • Drawback
    • Predicted ratings of some items sometimes are larger than 5. (Need to figure out what’s wrong here.)
  • Datasets
  • Source Code
  • Practice