牛骨文教育服务平台(让学习变的简单)

环路滤波(In-Loop Filtering)技术

类似于以往的视频编码标准,HEVC仍采用基于块的混合编码框架,一些失真效应仍然存在,如方块效应、振铃效应、颜色偏差以及图像模糊等等。为了解决这些问题,HEVC中采用了环路滤波技术,它其实是一种用于解码端的后处理滤波技术,主要包括去块滤波(Deblocking Filter,DBF)和样点自适应补偿(Sample Adaptive Offset,SAO)。其中,DBF的作用与H.264类似,主要是去除块效应,但是相比于H.264,其决策与滤波过程大大地被简化了,而SAO是HEVC中的新技术。

此处有一点需要注意的是,帧内预测采用的是解码宏块像素作为下个帧内预测的参考,而帧间预测则是采用经环路滤波后的解码宏块像素作为运动预测参考图像。这一点可以由环路滤波这个模块所处在编码框架的位置加以验证(如下图红色圈圈内)。当然这样做(经过环路滤波的重构像素才能作为后续编码像素的参考使用)是有原因的,即环路滤波处理后的重建像素更有利于参考,进一步减小后续编码像素的预测残差,有效地提高了视频的主客观质量。

下面对环路滤波中的去块滤波技术和样点自适应补偿技术做重点解析。

一、去块滤波技术

去块滤波(Deblocking Filter,DBF)用于降低方块效应(所谓方块效应就是图像中编码块边界的不连续性),造成方块效应的主要原因有三个:

①、各个块的变换、量化编码过程相互独立(相当于对各个块使用了不同参数的滤波器分别滤波,因此各块引入的量化误差大小及其分布特性相互独立,导致相邻块边界的不连续);

②、运动补偿预测过程中,相邻块的预测值可能来自于不同图像的不同位置,导致预测残差信号在块边界产生数值的不连续;

③、时域预测技术使得参考图像中存在的边界不连续可能会传递到后续编码图像。

正是由于块效应的产生原因才使得DBF只应用于块边界上的样本,即被用于所有与PU或TU边界相邻的样本,该选项可以在编码器中进行设置(设置的位置在编码结构配置文件中,如encoder_lowdelay_P_main.cfg文件的“Deblock Filter”部分,如下所示),需要注意的是,需要同时考虑PU和TU的边界,因为在某些帧间预测CB中,PU边界不一定总能和TU边界对齐。

#=========== Deblock Filter ============
DeblockingFilterControlPresent: 0           # Dbl control params present (0=not present, 1=present)
LoopFilterOffsetInPPS         : 0           # Dbl params: 0=varying params in SliceHeader, param = base_param + GOP_offset_param; 1=constant params in PPS, param = base_param)
LoopFilterDisable             : 0           # Disable deblocking filter (0=Filter, 1=No Filter)
LoopFilterBetaOffset_div2     : 0           # base_param: -6 ~ 6
LoopFilterTcOffset_div2       : 0           # base_param: -6 ~ 6
DeblockingFilterMetric        : 0           # blockiness metric (automatically configures deblocking parameters in bitstream)

有没有使能DBF,得到的效果图如下图所示(此处需要插一句话,经本人在HM平台上测试,发现DBF的效果并不是很明显,貌似几乎没什么改变,这一点的具体原因是去块滤波器的强度受限于很多因素,并不是每次试验都能成功得到与理论结论完全契合的结果)。

在H.264中,DBF应用于4x4大小块,而在HEVC中,无论亮度还是色度样本均只应用于8x8大小块。这一限定可以在不影响视觉质量的情况下,降低计算复杂度,同时通过防止相邻滤波操作之间的交互,便于并行处理的实现。

在HEVC中,DBF的处理顺序是:首先对整个图像的垂直边缘进行水平滤波,然后对水平边缘进行垂直滤波。该顺序使得多次水平滤波或者垂直滤波过程可以通过并行处理实现,或者仍可以以逐CTB的方式执行,这时会引入很小的处理延迟。

总结一句,对块边界进行平滑滤波可以有效地降低、去除方块效应。

二、样点自适应补偿技术

SAO是HEVC中的新技术,所以是我们重点学习的对象。

样点自适应补偿(Sample Adaptive Offset,SAO)用于改善振铃效应,SAO被自适应地用于所有满足特定条件的样本上。

造成振铃效应的原因是:高频信息的丢失(HEVC仍采用基于块的DCT变换,并在频域对变换系数进行量化,对于图像里的强边缘,由于高频交流系数的量化失真,解码后会在边缘周围产生波纹现象,即吉布斯现象,如下图所示,这种失真就是振铃效应,振铃效应会严重影响视频的主客观质量)。

正是由于高频信息的丢失才导致的振铃效应,因此要抑制振铃效应,就必须减小高频分量的失真,而直接精细量化高频分量势必导致压缩效率的降低。

SAO的解决方法如下(基本原理):从像素域入手降低振铃效应,对重构曲线中出现的波峰像素添加负值进行补偿,波谷添加正值进行补偿,由于在解码端只能得到重构图像信息,因此可以根据重构图像的特征点,通过将其划分类别,然后在像素域进行补偿处理。

在HEVC中,SAO以CTB为基本单位,通过选择一个合适的分类器将重建像素划分类别,然后对不同类别像素使用不同的补偿值,可以有效提高视频的主客观质量。它包括两大类补偿形式,分别是边界补偿(Edge Offset,EO)和边带补偿(Bang Offset,BO),此外还引入了参数融合技术。

(1)、边界补偿(Edge Offset,EO)

通过比较当前像素值与相邻像素值的大小,对当前像素进行分类,然后对同类像素补偿相同数值。为了均衡复杂度与编码效率,边界补偿选用了一维三像素分类模式,根据选取像素位置的差异,分为4种模式,即水平方向(EO_0)、垂直方向(EO_1)、135度方向(EO_2)和45度方向(EO_3)。在任意一种模式下,EO根据一个规则将所有的像素分成5类,然后对种类1至种类4进行补偿,即增加或减少一定数值(补偿值),而对于种类0的像素不进行补偿。并且还要遵循一个原则:不同种类的像素值可以采用不同的补偿值,但同一种类的像素必须采用相同的补偿。

对于边界补偿来讲,只需要传递补偿值的绝对值即可,解码器会根据像素补偿种类即可判断它的符号(原因是实验结果表明超过90%的补偿值,其符号与种类相匹配,因此按照不同种类对补偿值的符号进行了限制)。

(2)、边带补偿(Bang Offset,BO)

BO根据像素强度进行归类,它将像素范围等分成32条边带。然后每个条带根据自身像素特点进行补偿,且同一个边带使用相同的补偿值。HEVC中规定了一个CTB只能选择4条连续的边带,并只对属于这4个边带的像素进行补偿,这样边带补偿值数量与边界补偿值数量进行了统一,可以减少对线性存储器的要求,最终选择哪4条边带可以通过率失真优化方法来确定,然后将最小边带号以及4个补偿值传至解码端即可。

(3)、SAO参数融合

参数融合(Merge)是指对一个CTB块,其SAO参数直接使用相邻块的SAO参数,这时只需要标识采用了哪个相邻的SAO参数即可。

(4)、SAO在HM中的实现过程

SAO过程的重点是利用拉格朗日优化选择最优的SAO参数,为了降低计算复杂度,该过程采用了快速模式判别方法。一个CTU的SAO过程如下图所示:

SAO技术对应于HM中的代码如下:

TComSampleAdaptiveOffset.cpp

/* The copyright in this software is being made available under the BSD
 * License, included below. This software may be subject to other third party
 * and contributor rights, including patent rights, and no such rights are
 * granted under this license.  
 *
 * Copyright (c) 2010-2014, ITU/ISO/IEC
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *
 *  * Redistributions of source code must retain the above copyright notice,
 *    this list of conditions and the following disclaimer.
 *  * Redistributions in binary form must reproduce the above copyright notice,
 *    this list of conditions and the following disclaimer in the documentation
 *    and/or other materials provided with the distribution.
 *  * Neither the name of the ITU/ISO/IEC nor the names of its contributors may
 *    be used to endorse or promote products derived from this software without
 *    specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 * THE POSSIBILITY OF SUCH DAMAGE.
 */

/**file     TComSampleAdaptiveOffset.cpp
    rief    sample adaptive offset class
*/

#include "TComSampleAdaptiveOffset.h"
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>

//! ingroup TLibCommon
//! {
UInt g_saoMaxOffsetQVal[NUM_SAO_COMPONENTS];

SAOOffset::SAOOffset()
{ 
  reset();
}

SAOOffset::~SAOOffset()
{

}

Void SAOOffset::reset()
{
  modeIdc = SAO_MODE_OFF;
  typeIdc = -1;
  typeAuxInfo = -1;
  ::memset(offset, 0, sizeof(Int)* MAX_NUM_SAO_CLASSES);
}

const SAOOffset& SAOOffset::operator= (const SAOOffset& src)
{
  modeIdc = src.modeIdc;
  typeIdc = src.typeIdc;
  typeAuxInfo = src.typeAuxInfo;
  ::memcpy(offset, src.offset, sizeof(Int)* MAX_NUM_SAO_CLASSES);

  return *this;
}

SAOBlkParam::SAOBlkParam()
{
  reset();
}

SAOBlkParam::~SAOBlkParam()
{

}

Void SAOBlkParam::reset()
{
  for(Int compIdx=0; compIdx< 3; compIdx++)
  {
    offsetParam[compIdx].reset();
  }
}

const SAOBlkParam& SAOBlkParam::operator= (const SAOBlkParam& src)
{
  for(Int compIdx=0; compIdx< 3; compIdx++)
  {
    offsetParam[compIdx] = src.offsetParam[compIdx];
  }
  return *this;

}

TComSampleAdaptiveOffset::TComSampleAdaptiveOffset()
{
  m_tempPicYuv = NULL;
  for(Int compIdx=0; compIdx < NUM_SAO_COMPONENTS; compIdx++)
  {
    m_offsetClipTable[compIdx] = NULL;
  }
#if !SAO_SGN_FUNC
  m_signTable = NULL; 
#endif
  
  m_lineBufWidth = 0;
  m_signLineBuf1 = NULL;
  m_signLineBuf2 = NULL;
}

TComSampleAdaptiveOffset::~TComSampleAdaptiveOffset()
{
  destroy();
  
  if (m_signLineBuf1) delete[] m_signLineBuf1; m_signLineBuf1 = NULL;
  if (m_signLineBuf2) delete[] m_signLineBuf2; m_signLineBuf2 = NULL;
}

Void TComSampleAdaptiveOffset::create( Int picWidth, Int picHeight, UInt maxCUWidth, UInt maxCUHeight, UInt maxCUDepth )
{
  destroy();

  m_picWidth = picWidth;    
  m_picHeight= picHeight;
  m_maxCUWidth= maxCUWidth; 
  m_maxCUHeight= maxCUHeight;

  m_numCTUInWidth = (m_picWidth/m_maxCUWidth) + ((m_picWidth % m_maxCUWidth)?1:0);
  m_numCTUInHeight= (m_picHeight/m_maxCUHeight) + ((m_picHeight % m_maxCUHeight)?1:0);
  m_numCTUsPic = m_numCTUInHeight*m_numCTUInWidth;

  //temporary picture buffer
  if ( !m_tempPicYuv )
  {
    m_tempPicYuv = new TComPicYuv;
    m_tempPicYuv->create( m_picWidth, m_picHeight, m_maxCUWidth, m_maxCUHeight, maxCUDepth );
  }

  //bit-depth related
  for(Int compIdx =0; compIdx < NUM_SAO_COMPONENTS; compIdx++)
  {
    Int bitDepthSample = (compIdx == SAO_Y)?g_bitDepthY:g_bitDepthC;
    m_offsetStepLog2  [compIdx] = max(bitDepthSample - MAX_SAO_TRUNCATED_BITDEPTH, 0);
    g_saoMaxOffsetQVal[compIdx] = (1<<(min(bitDepthSample,MAX_SAO_TRUNCATED_BITDEPTH)-5))-1; //Table 9-32, inclusive
  }

#if !SAO_SGN_FUNC
  //look-up table for clipping
  Int overallMaxSampleValue=0;
#endif
  for(Int compIdx =0; compIdx < NUM_SAO_COMPONENTS; compIdx++)
  {
    Int bitDepthSample = (compIdx == SAO_Y)?g_bitDepthY:g_bitDepthC; //exclusive
    Int maxSampleValue = (1<< bitDepthSample); //exclusive
    Int maxOffsetValue = (g_saoMaxOffsetQVal[compIdx] << m_offsetStepLog2[compIdx]);
#if !SAO_SGN_FUNC
    if (maxSampleValue>overallMaxSampleValue) overallMaxSampleValue=maxSampleValue;
#endif

    m_offsetClipTable[compIdx] = new Int[(maxSampleValue + maxOffsetValue -1)+ (maxOffsetValue)+1 ]; //positive & negative range plus 0
    m_offsetClip[compIdx] = &(m_offsetClipTable[compIdx][maxOffsetValue]);

    //assign clipped values 
    Int* offsetClipPtr = m_offsetClip[compIdx];
    for(Int k=0; k< maxSampleValue; k++)
    {
      *(offsetClipPtr + k) = k;
    }
    for(Int k=0; k< maxOffsetValue; k++ )
    {
      *(offsetClipPtr + maxSampleValue+ k) = maxSampleValue-1;
      *(offsetClipPtr -k -1 )              = 0;
    }
  }

#if !SAO_SGN_FUNC
  m_signTable = new Short[ 2*(overallMaxSampleValue-1) + 1 ];
  m_sign = &(m_signTable[overallMaxSampleValue-1]);

  m_sign[0] = 0;
  for(Int k=1; k< overallMaxSampleValue; k++)
  {
    m_sign[k] = 1;
    m_sign[-k]= -1;
  }
#endif
}

Void TComSampleAdaptiveOffset::destroy()
{
  if ( m_tempPicYuv )
  {
    m_tempPicYuv->destroy();
    delete m_tempPicYuv;
    m_tempPicYuv = NULL;
  }

  for(Int compIdx=0; compIdx < NUM_SAO_COMPONENTS; compIdx++)
  {
    if(m_offsetClipTable[compIdx])
    {
      delete[] m_offsetClipTable[compIdx]; m_offsetClipTable[compIdx] = NULL;
    }
  }
#if !SAO_SGN_FUNC
  if( m_signTable )
  {
    delete[] m_signTable; m_signTable = NULL;
  }
#endif
}

Void TComSampleAdaptiveOffset::invertQuantOffsets(Int compIdx, Int typeIdc, Int typeAuxInfo, Int* dstOffsets, Int* srcOffsets)
{
  Int codedOffset[MAX_NUM_SAO_CLASSES];

  ::memcpy(codedOffset, srcOffsets, sizeof(Int)*MAX_NUM_SAO_CLASSES);
  ::memset(dstOffsets, 0, sizeof(Int)*MAX_NUM_SAO_CLASSES);

  if(typeIdc == SAO_TYPE_START_BO)
  {
    for(Int i=0; i< 4; i++)
    {
      dstOffsets[(typeAuxInfo+ i)%NUM_SAO_BO_CLASSES] = codedOffset[(typeAuxInfo+ i)%NUM_SAO_BO_CLASSES]*(1<<m_offsetStepLog2[compIdx]);
    }
  }
  else //EO
  {
    for(Int i=0; i< NUM_SAO_EO_CLASSES; i++)
    {
      dstOffsets[i] = codedOffset[i] *(1<<m_offsetStepLog2[compIdx]);
    }
    assert(dstOffsets[SAO_CLASS_EO_PLAIN] == 0); //keep EO plain offset as zero
  }

}

Int TComSampleAdaptiveOffset::getMergeList(TComPic* pic, Int ctu, SAOBlkParam* blkParams, std::vector<SAOBlkParam*>& mergeList)
{
  Int ctuX = ctu % m_numCTUInWidth;
  Int ctuY = ctu / m_numCTUInWidth;
  Int mergedCTUPos;
  Int numValidMergeCandidates = 0;

  for(Int mergeType=0; mergeType< NUM_SAO_MERGE_TYPES; mergeType++)
  {
    SAOBlkParam* mergeCandidate = NULL;

    switch(mergeType)
    {
    case SAO_MERGE_ABOVE:
      {
        if(ctuY > 0)
        {
          mergedCTUPos = ctu- m_numCTUInWidth;
          if( pic->getSAOMergeAvailability(ctu, mergedCTUPos) )
          {
            mergeCandidate = &(blkParams[mergedCTUPos]);
          }
        }
      }
      break;
    case SAO_MERGE_LEFT:
      {
        if(ctuX > 0)
        {
          mergedCTUPos = ctu- 1;
          if( pic->getSAOMergeAvailability(ctu, mergedCTUPos) )
          {
            mergeCandidate = &(blkParams[mergedCTUPos]);
          }
        }
      }
      break;
    default:
      {
        printf("not a supported merge type");
        assert(0);
        exit(-1);
      }
    }

    mergeList.push_back(mergeCandidate);
    if (mergeCandidate != NULL)
    {
      numValidMergeCandidates++;
    }
  }

  return numValidMergeCandidates;
}

Void TComSampleAdaptiveOffset::reconstructBlkSAOParam(SAOBlkParam& recParam, std::vector<SAOBlkParam*>& mergeList)
{
  for(Int compIdx=0; compIdx< NUM_SAO_COMPONENTS; compIdx++)
  {
    SAOOffset& offsetParam = recParam[compIdx];

    if(offsetParam.modeIdc == SAO_MODE_OFF)
    {
      continue;
    }

    switch(offsetParam.modeIdc)
    {
    case SAO_MODE_NEW:
      {
        invertQuantOffsets(compIdx, offsetParam.typeIdc, offsetParam.typeAuxInfo, offsetParam.offset, offsetParam.offset);
      }
      break;
    case SAO_MODE_MERGE:
      {
        SAOBlkParam* mergeTarget = mergeList[offsetParam.typeIdc];
        assert(mergeTarget != NULL);

        offsetParam = (*mergeTarget)[compIdx];
      }
      break;
    default:
      {
        printf("Not a supported mode");
        assert(0);
        exit(-1);
      }
    }
  }
}

Void TComSampleAdaptiveOffset::reconstructBlkSAOParams(TComPic* pic, SAOBlkParam* saoBlkParams)
{
  m_picSAOEnabled[SAO_Y] = m_picSAOEnabled[SAO_Cb] = m_picSAOEnabled[SAO_Cr] = false;

  for(Int ctu=0; ctu< m_numCTUsPic; ctu++)
  {
    std::vector<SAOBlkParam*> mergeList;
    getMergeList(pic, ctu, saoBlkParams, mergeList);

    reconstructBlkSAOParam(saoBlkParams[ctu], mergeList);

    for(Int compIdx=0; compIdx< NUM_SAO_COMPONENTS; compIdx++)
    {
      if(saoBlkParams[ctu][compIdx].modeIdc != SAO_MODE_OFF)
      {
        m_picSAOEnabled[compIdx] = true;
      }
    }
  }

}

Void TComSampleAdaptiveOffset::offsetBlock(Int compIdx, Int typeIdx, Int* offset  
                                          , Pel* srcBlk, Pel* resBlk, Int srcStride, Int resStride,  Int width, Int height
                                          , Bool isLeftAvail,  Bool isRightAvail, Bool isAboveAvail, Bool isBelowAvail, Bool isAboveLeftAvail, Bool isAboveRightAvail, Bool isBelowLeftAvail, Bool isBelowRightAvail)
{
  if(m_lineBufWidth != m_maxCUWidth)
  {
    m_lineBufWidth = m_maxCUWidth;
    
    if (m_signLineBuf1) delete[] m_signLineBuf1; m_signLineBuf1 = NULL;
    m_signLineBuf1 = new Char[m_lineBufWidth+1];
    
    if (m_signLineBuf2) delete[] m_signLineBuf2; m_signLineBuf2 = NULL;
    m_signLineBuf2 = new Char[m_lineBufWidth+1];
  }

  Int* offsetClip = m_offsetClip[compIdx];

  Int x,y, startX, startY, endX, endY, edgeType;
  Int firstLineStartX, firstLineEndX, lastLineStartX, lastLineEndX;
  Char signLeft, signRight, signDown;

  Pel* srcLine = srcBlk;
  Pel* resLine = resBlk;

  switch(typeIdx)
  {
  case SAO_TYPE_EO_0:
    {
      offset += 2;
      startX = isLeftAvail ? 0 : 1;
      endX   = isRightAvail ? width : (width -1);
      for (y=0; y< height; y++)
      {
#if SAO_SGN_FUNC
        signLeft = (Char)sgn(srcLine[startX] - srcLine[startX-1]);
#else
        signLeft = (Char)m_sign[srcLine[startX] - srcLine[startX-1]];
#endif
        for (x=startX; x< endX; x++)
        {
#if SAO_SGN_FUNC
          signRight = (Char)sgn(srcLine[x] - srcLine[x+1]); 
#else
          signRight = (Char)m_sign[srcLine[x] - srcLine[x+1]]; 
#endif
          edgeType =  signRight + signLeft;
          signLeft  = -signRight;

          resLine[x] = offsetClip[srcLine[x] + offset[edgeType]];
        }
        srcLine  += srcStride;
        resLine += resStride;
      }

    }
    break;
  case SAO_TYPE_EO_90:
    {
      offset += 2;
      Char *signUpLine = m_signLineBuf1;

      startY = isAboveAvail ? 0 : 1;
      endY   = isBelowAvail ? height : height-1;
      if (!isAboveAvail)
      {
        srcLine += srcStride;
        resLine += resStride;
      }

      Pel* srcLineAbove= srcLine- srcStride;
      for (x=0; x< width; x++)
      {
#if SAO_SGN_FUNC
        signUpLine[x] = (Char)sgn(srcLine[x] - srcLineAbove[x]);
#else
        signUpLine[x] = (Char)m_sign[srcLine[x] - srcLineAbove[x]];
#endif
      }

      Pel* srcLineBelow;
      for (y=startY; y<endY; y++)
      {
        srcLineBelow= srcLine+ srcStride;

        for (x=0; x< width; x++)
        {
#if SAO_SGN_FUNC
          signDown  = (Char)sgn(srcLine[x] - srcLineBelow[x]);
#else
          signDown  = (Char)m_sign[srcLine[x] - srcLineBelow[x]]; 
#endif
          edgeType = signDown + signUpLine[x];
          signUpLine[x]= -signDown;

          resLine[x] = offsetClip[srcLine[x] + offset[edgeType]];
        }
        srcLine += srcStride;
        resLine += resStride;
      }

    }
    break;
  case SAO_TYPE_EO_135:
    {
      offset += 2;
      Char *signUpLine, *signDownLine, *signTmpLine;

      signUpLine  = m_signLineBuf1;
      signDownLine= m_signLineBuf2;

      startX = isLeftAvail ? 0 : 1 ;
      endX   = isRightAvail ? width : (width-1);

      //prepare 2nd line"s upper sign
      Pel* srcLineBelow= srcLine+ srcStride;
      for (x=startX; x< endX+1; x++)
      {
#if SAO_SGN_FUNC
        signUpLine[x] = (Char)sgn(srcLineBelow[x] - srcLine[x- 1]);
#else
        signUpLine[x] = (Char)m_sign[srcLineBelow[x] - srcLine[x- 1]];
#endif
      }

      //1st line
      Pel* srcLineAbove= srcLine- srcStride;
      firstLineStartX = isAboveLeftAvail ? 0 : 1;
      firstLineEndX   = isAboveAvail? endX: 1;
      for(x= firstLineStartX; x< firstLineEndX; x++)
      {
#if SAO_SGN_FUNC
        edgeType  =  sgn(srcLine[x] - srcLineAbove[x- 1]) - signUpLine[x+1];
#else
        edgeType  =  m_sign[srcLine[x] - srcLineAbove[x- 1]] - signUpLine[x+1];
#endif
        resLine[x] = offsetClip[srcLine[x] + offset[edgeType]];
      }
      srcLine  += srcStride;
      resLine  += resStride;

      //middle lines
      for (y= 1; y< height-1; y++)
      {
        srcLineBelow= srcLine+ srcStride;

        for (x=startX; x<endX; x++)
        {
#if SAO_SGN_FUNC
          signDown =  (Char)sgn(srcLine[x] - srcLineBelow[x+ 1]);
#else
          signDown =  (Char)m_sign[srcLine[x] - srcLineBelow[x+ 1]] ;
#endif
          edgeType =  signDown + signUpLine[x];
          resLine[x] = offsetClip[srcLine[x] + offset[edgeType]];

          signDownLine[x+1] = -signDown; 
        }
#if SAO_SGN_FUNC
        signDownLine[startX] = (Char)sgn(srcLineBelow[startX] - srcLine[startX-1]);
#else
        signDownLine[startX] = (Char)m_sign[srcLineBelow[startX] - srcLine[startX-1]];
#endif

        signTmpLine  = signUpLine;
        signUpLine   = signDownLine;
        signDownLine = signTmpLine;

        srcLine += srcStride;
        resLine += resStride;
      }

      //last line
      srcLineBelow= srcLine+ srcStride;
      lastLineStartX = isBelowAvail ? startX : (width -1);
      lastLineEndX   = isBelowRightAvail ? width : (width -1);
      for(x= lastLineStartX; x< lastLineEndX; x++)
      {
#if SAO_SGN_FUNC
        edgeType =  sgn(srcLine[x] - srcLineBelow[x+ 1]) + signUpLine[x];
#else
        edgeType =  m_sign[srcLine[x] - srcLineBelow[x+ 1]] + signUpLine[x];
#endif
        resLine[x] = offsetClip[srcLine[x] + offset[edgeType]];

      }
    }
    break;
  case SAO_TYPE_EO_45:
    {
      offset += 2;
      Char *signUpLine = m_signLineBuf1+1;

      startX = isLeftAvail ? 0 : 1;
      endX   = isRightAvail ? width : (width -1);

      //prepare 2nd line upper sign
      Pel* srcLineBelow= srcLine+ srcStride;
      for (x=startX-1; x< endX; x++)
      {
#if SAO_SGN_FUNC
        signUpLine[x] = (Char)sgn(srcLineBelow[x] - srcLine[x+1]);
#else
        signUpLine[x] = (Char)m_sign[srcLineBelow[x] - srcLine[x+1]];
#endif
      }

      //first line
      Pel* srcLineAbove= srcLine- srcStride;
      firstLineStartX = isAboveAvail ? startX : (width -1 );
      firstLineEndX   = isAboveRightAvail ? width : (width-1);
      for(x= firstLineStartX; x< firstLineEndX; x++)
      {
#if SAO_SGN_FUNC
        edgeType = sgn(srcLine[x] - srcLineAbove[x+1]) -signUpLine[x-1];
#else
        edgeType = m_sign[srcLine[x] - srcLineAbove[x+1]] -signUpLine[x-1];
#endif
        resLine[x] = offsetClip[srcLine[x] + offset[edgeType]];
      }
      srcLine += srcStride;
      resLine += resStride;

      //middle lines
      for (y= 1; y< height-1; y++)
      {
        srcLineBelow= srcLine+ srcStride;

        for(x= startX; x< endX; x++)
        {
#if SAO_SGN_FUNC
          signDown =  (Char)sgn(srcLine[x] - srcLineBelow[x-1]);
#else
          signDown =  (Char)m_sign[srcLine[x] - srcLineBelow[x-1]] ;
#endif
          edgeType =  signDown + signUpLine[x];
          resLine[x] = offsetClip[srcLine[x] + offset[edgeType]];
          signUpLine[x-1] = -signDown; 
        }
#if SAO_SGN_FUNC
        signUpLine[endX-1] = (Char)sgn(srcLineBelow[endX-1] - srcLine[endX]);
#else
        signUpLine[endX-1] = (Char)m_sign[srcLineBelow[endX-1] - srcLine[endX]];
#endif
        srcLine  += srcStride;
        resLine += resStride;
      }

      //last line
      srcLineBelow= srcLine+ srcStride;
      lastLineStartX = isBelowLeftAvail ? 0 : 1;
      lastLineEndX   = isBelowAvail ? endX : 1;
      for(x= lastLineStartX; x< lastLineEndX; x++)
      {
#if SAO_SGN_FUNC
        edgeType = sgn(srcLine[x] - srcLineBelow[x-1]) + signUpLine[x];
#else
        edgeType = m_sign[srcLine[x] - srcLineBelow[x-1]] + signUpLine[x];
#endif
        resLine[x] = offsetClip[srcLine[x] + offset[edgeType]];

      }
    }
    break;
  case SAO_TYPE_BO:
    {
      Int shiftBits = ((compIdx == SAO_Y)?g_bitDepthY:g_bitDepthC)- NUM_SAO_BO_CLASSES_LOG2;
      for (y=0; y< height; y++)
      {
        for (x=0; x< width; x++)
        {
          resLine[x] = offsetClip[ srcLine[x] + offset[srcLine[x] >> shiftBits] ];
        }
        srcLine += srcStride;
        resLine += resStride;
      }
    }
    break;
  default:
    {
      printf("Not a supported SAO types
");
      assert(0);
      exit(-1);
    }
  }

}

Void TComSampleAdaptiveOffset::offsetCTU(Int ctu, TComPicYuv* srcYuv, TComPicYuv* resYuv, SAOBlkParam& saoblkParam, TComPic* pPic)
{
  Bool isLeftAvail,isRightAvail,isAboveAvail,isBelowAvail,isAboveLeftAvail,isAboveRightAvail,isBelowLeftAvail,isBelowRightAvail;

  if( 
    (saoblkParam[SAO_Y ].modeIdc == SAO_MODE_OFF) &&
    (saoblkParam[SAO_Cb].modeIdc == SAO_MODE_OFF) &&
    (saoblkParam[SAO_Cr].modeIdc == SAO_MODE_OFF)
    )
  {
    return;
  }

  //block boundary availability
  pPic->getPicSym()->deriveLoopFilterBoundaryAvailibility(ctu, isLeftAvail,isRightAvail,isAboveAvail,isBelowAvail,isAboveLeftAvail,isAboveRightAvail,isBelowLeftAvail,isBelowRightAvail);

  Int yPos   = (ctu / m_numCTUInWidth)*m_maxCUHeight;
  Int xPos   = (ctu % m_numCTUInWidth)*m_maxCUWidth;
  Int height = (yPos + m_maxCUHeight > m_picHeight)?(m_picHeight- yPos):m_maxCUHeight;
  Int width  = (xPos + m_maxCUWidth  > m_picWidth )?(m_picWidth - xPos):m_maxCUWidth;

  for(Int compIdx= 0; compIdx < NUM_SAO_COMPONENTS; compIdx++)
  {
    SAOOffset& ctbOffset = saoblkParam[compIdx];

    if(ctbOffset.modeIdc != SAO_MODE_OFF)
    {
      Bool isLuma     = (compIdx == SAO_Y);
      Int  formatShift= isLuma?0:1;

      Int  blkWidth   = (width  >> formatShift);
      Int  blkHeight  = (height >> formatShift);
      Int  blkYPos    = (yPos   >> formatShift);
      Int  blkXPos    = (xPos   >> formatShift);

      Int  srcStride = isLuma?srcYuv->getStride():srcYuv->getCStride();
      Pel* srcBlk    = getPicBuf(srcYuv, compIdx)+ (yPos >> formatShift)*srcStride+ (xPos >> formatShift);

      Int  resStride  = isLuma?resYuv->getStride():resYuv->getCStride();
      Pel* resBlk     = getPicBuf(resYuv, compIdx)+ blkYPos*resStride+ blkXPos;

      offsetBlock( compIdx, ctbOffset.typeIdc, ctbOffset.offset
                  , srcBlk, resBlk, srcStride, resStride, blkWidth, blkHeight
                  , isLeftAvail, isRightAvail
                  , isAboveAvail, isBelowAvail
                  , isAboveLeftAvail, isAboveRightAvail
                  , isBelowLeftAvail, isBelowRightAvail
                  );
    }
  } //compIdx

}

Void TComSampleAdaptiveOffset::SAOProcess(TComPic* pDecPic)
{
  if(!m_picSAOEnabled[SAO_Y] && !m_picSAOEnabled[SAO_Cb] && !m_picSAOEnabled[SAO_Cr])
  {
    return;
  }
  TComPicYuv* resYuv = pDecPic->getPicYuvRec();
  TComPicYuv* srcYuv = m_tempPicYuv;
  resYuv->copyToPic(srcYuv);
  for(Int ctu= 0; ctu < m_numCTUsPic; ctu++)
  {
    offsetCTU(ctu, srcYuv, resYuv, (pDecPic->getPicSym()->getSAOBlkParam())[ctu], pDecPic);
  } //ctu
}

Pel* TComSampleAdaptiveOffset::getPicBuf(TComPicYuv* pPicYuv, Int compIdx)
{
  Pel* pBuf = NULL;
  switch(compIdx)
  {
  case SAO_Y:
    {
      pBuf = pPicYuv->getLumaAddr();
    }
    break;
  case SAO_Cb:
    {
      pBuf = pPicYuv->getCbAddr();
    }
    break;
  case SAO_Cr:
    {
      pBuf = pPicYuv->getCrAddr();
    }
    break;
  default:
    {
      printf("Not a legal component ID for SAO
");
      assert(0);
      exit(-1);
    }
  }

  return pBuf;
}

/**PCM LF disable process.
 * param pcPic picture (TComPic) pointer
 * 
eturns Void
 *
 * 
ote Replace filtered sample values of PCM mode blocks with the transmitted and reconstructed ones.
 */
Void TComSampleAdaptiveOffset::PCMLFDisableProcess (TComPic* pcPic)
{
  xPCMRestoration(pcPic);
}

/**Picture-level PCM restoration. 
 * param pcPic picture (TComPic) pointer
 * 
eturns Void
 */
Void TComSampleAdaptiveOffset::xPCMRestoration(TComPic* pcPic)
{
  Bool  bPCMFilter = (pcPic->getSlice(0)->getSPS()->getUsePCM() && pcPic->getSlice(0)->getSPS()->getPCMFilterDisableFlag())? true : false;

  if(bPCMFilter || pcPic->getSlice(0)->getPPS()->getTransquantBypassEnableFlag())
  {
    for( UInt uiCUAddr = 0; uiCUAddr < pcPic->getNumCUsInFrame() ; uiCUAddr++ )
    {
      TComDataCU* pcCU = pcPic->getCU(uiCUAddr);

      xPCMCURestoration(pcCU, 0, 0); 
    } 
  }
}

/**PCM CU restoration. 
 * param pcCU pointer to current CU
 * param uiAbsPartIdx part index
 * param uiDepth CU depth
 * 
eturns Void
 */
Void TComSampleAdaptiveOffset::xPCMCURestoration ( TComDataCU* pcCU, UInt uiAbsZorderIdx, UInt uiDepth )
{
  TComPic* pcPic     = pcCU->getPic();
  UInt uiCurNumParts = pcPic->getNumPartInCU() >> (uiDepth<<1);
  UInt uiQNumParts   = uiCurNumParts>>2;

  // go to sub-CU
  if( pcCU->getDepth(uiAbsZorderIdx) > uiDepth )
  {
    for ( UInt uiPartIdx = 0; uiPartIdx < 4; uiPartIdx++, uiAbsZorderIdx+=uiQNumParts )
    {
      UInt uiLPelX   = pcCU->getCUPelX() + g_auiRasterToPelX[ g_auiZscanToRaster[uiAbsZorderIdx] ];
      UInt uiTPelY   = pcCU->getCUPelY() + g_auiRasterToPelY[ g_auiZscanToRaster[uiAbsZorderIdx] ];
      if( ( uiLPelX < pcCU->getSlice()->getSPS()->getPicWidthInLumaSamples() ) && ( uiTPelY < pcCU->getSlice()->getSPS()->getPicHeightInLumaSamples() ) )
        xPCMCURestoration( pcCU, uiAbsZorderIdx, uiDepth+1 );
    }
    return;
  }

  // restore PCM samples
  if ((pcCU->getIPCMFlag(uiAbsZorderIdx)&& pcPic->getSlice(0)->getSPS()->getPCMFilterDisableFlag()) || pcCU->isLosslessCoded( uiAbsZorderIdx))
  {
    xPCMSampleRestoration (pcCU, uiAbsZorderIdx, uiDepth, TEXT_LUMA    );
    xPCMSampleRestoration (pcCU, uiAbsZorderIdx, uiDepth, TEXT_CHROMA_U);
    xPCMSampleRestoration (pcCU, uiAbsZorderIdx, uiDepth, TEXT_CHROMA_V);
  }
}

/**PCM sample restoration. 
 * param pcCU pointer to current CU
 * param uiAbsPartIdx part index
 * param uiDepth CU depth
 * param ttText texture component type
 * 
eturns Void
 */
Void TComSampleAdaptiveOffset::xPCMSampleRestoration (TComDataCU* pcCU, UInt uiAbsZorderIdx, UInt uiDepth, TextType ttText)
{
  TComPicYuv* pcPicYuvRec = pcCU->getPic()->getPicYuvRec();
  Pel* piSrc;
  Pel* piPcm;
  UInt uiStride;
  UInt uiWidth;
  UInt uiHeight;
  UInt uiPcmLeftShiftBit; 
  UInt uiX, uiY;
  UInt uiMinCoeffSize = pcCU->getPic()->getMinCUWidth()*pcCU->getPic()->getMinCUHeight();
  UInt uiLumaOffset   = uiMinCoeffSize*uiAbsZorderIdx;
  UInt uiChromaOffset = uiLumaOffset>>2;

  if( ttText == TEXT_LUMA )
  {
    piSrc = pcPicYuvRec->getLumaAddr( pcCU->getAddr(), uiAbsZorderIdx);
    piPcm = pcCU->getPCMSampleY() + uiLumaOffset;
    uiStride  = pcPicYuvRec->getStride();
    uiWidth  = (g_uiMaxCUWidth >> uiDepth);
    uiHeight = (g_uiMaxCUHeight >> uiDepth);
    if ( pcCU->isLosslessCoded(uiAbsZorderIdx) && !pcCU->getIPCMFlag(uiAbsZorderIdx) )
    {
      uiPcmLeftShiftBit = 0;
    }
    else
    {
      uiPcmLeftShiftBit = g_bitDepthY - pcCU->getSlice()->getSPS()->getPCMBitDepthLuma();
    }
  }
  else
  {
    if( ttText == TEXT_CHROMA_U )
    {
      piSrc = pcPicYuvRec->getCbAddr( pcCU->getAddr(), uiAbsZorderIdx );
      piPcm = pcCU->getPCMSampleCb() + uiChromaOffset;
    }
    else
    {
      piSrc = pcPicYuvRec->getCrAddr( pcCU->getAddr(), uiAbsZorderIdx );
      piPcm = pcCU->getPCMSampleCr() + uiChromaOffset;
    }

    uiStride = pcPicYuvRec->getCStride();
    uiWidth  = ((g_uiMaxCUWidth >> uiDepth)/2);
    uiHeight = ((g_uiMaxCUWidth >> uiDepth)/2);
    if ( pcCU->isLosslessCoded(uiAbsZorderIdx) && !pcCU->getIPCMFlag(uiAbsZorderIdx) )
    {
      uiPcmLeftShiftBit = 0;
    }
    else
    {
      uiPcmLeftShiftBit = g_bitDepthC - pcCU->getSlice()->getSPS()->getPCMBitDepthChroma();
    }
  }

  for( uiY = 0; uiY < uiHeight; uiY++ )
  {
    for( uiX = 0; uiX < uiWidth; uiX++ )
    {
      piSrc[uiX] = (piPcm[uiX] << uiPcmLeftShiftBit);
    }
    piPcm += uiWidth;
    piSrc += uiStride;
  }
}

//! }

TEncSampleAdaptiveOffset.cpp

/* The copyright in this software is being made available under the BSD
 * License, included below. This software may be subject to other third party
 * and contributor rights, including patent rights, and no such rights are
 * granted under this license.  
 *
 * Copyright (c) 2010-2014, ITU/ISO/IEC
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *
 *  * Redistributions of source code must retain the above copyright notice,
 *    this list of conditions and the following disclaimer.
 *  * Redistributions in binary form must reproduce the above copyright notice,
 *    this list of conditions and the following disclaimer in the documentation
 *    and/or other materials provided with the distribution.
 *  * Neither the name of the ITU/ISO/IEC nor the names of its contributors may
 *    be used to endorse or promote products derived from this software without
 *    specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS
 * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 * THE POSSIBILITY OF SUCH DAMAGE.
 */

/**
 file     TEncSampleAdaptiveOffset.cpp
 rief       estimation part of sample adaptive offset class
 */
#include "TEncSampleAdaptiveOffset.h"
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>

//! ingroup TLibEncoder
//! {

/**rounding with IBDI
 * param  x
 */
inline Double xRoundIbdi2(Int bitDepth, Double x)
{
  return ((x)>0) ? (Int)(((Int)(x)+(1<<(bitDepth-8-1)))/(1<<(bitDepth-8))) : ((Int)(((Int)(x)-(1<<(bitDepth-8-1)))/(1<<(bitDepth-8))));
}

inline Double xRoundIbdi(Int bitDepth, Double x)
{
  return (bitDepth > 8 ? xRoundIbdi2(bitDepth, (x)) : ((x)>=0 ? ((Int)((x)+0.5)) : ((Int)((x)-0.5)))) ;
}

TEncSampleAdaptiveOffset::TEncSampleAdaptiveOffset()
{
  m_pppcRDSbacCoder = NULL;           
  m_pcRDGoOnSbacCoder = NULL;
  m_pppcBinCoderCABAC = NULL;    
  m_statData = NULL;
#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
  m_preDBFstatData = NULL;
#endif
}

TEncSampleAdaptiveOffset::~TEncSampleAdaptiveOffset()
{
  destroyEncData();
}

#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
Void TEncSampleAdaptiveOffset::createEncData(Bool isPreDBFSamplesUsed)
#else
Void TEncSampleAdaptiveOffset::createEncData()
#endif
{

  //cabac coder for RDO
  m_pppcRDSbacCoder = new TEncSbac* [NUM_SAO_CABACSTATE_LABELS];
  m_pppcBinCoderCABAC = new TEncBinCABACCounter* [NUM_SAO_CABACSTATE_LABELS];

  for(Int cs=0; cs < NUM_SAO_CABACSTATE_LABELS; cs++)
  {
    m_pppcRDSbacCoder[cs] = new TEncSbac;
    m_pppcBinCoderCABAC[cs] = new TEncBinCABACCounter;
    m_pppcRDSbacCoder   [cs]->init( m_pppcBinCoderCABAC [cs] );
  }

  //statistics
  m_statData = new SAOStatData**[m_numCTUsPic];
  for(Int i=0; i< m_numCTUsPic; i++)
  {
    m_statData[i] = new SAOStatData*[NUM_SAO_COMPONENTS];
    for(Int compIdx=0; compIdx < NUM_SAO_COMPONENTS; compIdx++)
    {
      m_statData[i][compIdx] = new SAOStatData[NUM_SAO_NEW_TYPES];
    }
  }
#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
  if(isPreDBFSamplesUsed)
  {
    m_preDBFstatData = new SAOStatData**[m_numCTUsPic];
    for(Int i=0; i< m_numCTUsPic; i++)
    {
      m_preDBFstatData[i] = new SAOStatData*[NUM_SAO_COMPONENTS];
      for(Int compIdx=0; compIdx < NUM_SAO_COMPONENTS; compIdx++)
      {
        m_preDBFstatData[i][compIdx] = new SAOStatData[NUM_SAO_NEW_TYPES];
      }
    }

  }
#endif

#if SAO_ENCODING_CHOICE
  ::memset(m_saoDisabledRate, 0, sizeof(m_saoDisabledRate));
#endif

  for(Int typeIdc=0; typeIdc < NUM_SAO_NEW_TYPES; typeIdc++)
  {
    m_skipLinesR[SAO_Y ][typeIdc]= 5;
    m_skipLinesR[SAO_Cb][typeIdc]= m_skipLinesR[SAO_Cr][typeIdc]= 3;

    m_skipLinesB[SAO_Y ][typeIdc]= 4;
    m_skipLinesB[SAO_Cb][typeIdc]= m_skipLinesB[SAO_Cr][typeIdc]= 2;

#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
    if(isPreDBFSamplesUsed)
    {
      switch(typeIdc)
      {
      case SAO_TYPE_EO_0:
        {
          m_skipLinesR[SAO_Y ][typeIdc]= 5;
          m_skipLinesR[SAO_Cb][typeIdc]= m_skipLinesR[SAO_Cr][typeIdc]= 3;

          m_skipLinesB[SAO_Y ][typeIdc]= 3;
          m_skipLinesB[SAO_Cb][typeIdc]= m_skipLinesB[SAO_Cr][typeIdc]= 1;
        }
        break;
      case SAO_TYPE_EO_90:
        {
          m_skipLinesR[SAO_Y ][typeIdc]= 4;
          m_skipLinesR[SAO_Cb][typeIdc]= m_skipLinesR[SAO_Cr][typeIdc]= 2;

          m_skipLinesB[SAO_Y ][typeIdc]= 4;
          m_skipLinesB[SAO_Cb][typeIdc]= m_skipLinesB[SAO_Cr][typeIdc]= 2;
        }
        break;
      case SAO_TYPE_EO_135:
      case SAO_TYPE_EO_45:
        {
          m_skipLinesR[SAO_Y ][typeIdc]= 5;
          m_skipLinesR[SAO_Cb][typeIdc]= m_skipLinesR[SAO_Cr][typeIdc]= 3;

          m_skipLinesB[SAO_Y ][typeIdc]= 4;
          m_skipLinesB[SAO_Cb][typeIdc]= m_skipLinesB[SAO_Cr][typeIdc]= 2;
        }
        break;
      case SAO_TYPE_BO:
        {
          m_skipLinesR[SAO_Y ][typeIdc]= 4;
          m_skipLinesR[SAO_Cb][typeIdc]= m_skipLinesR[SAO_Cr][typeIdc]= 2;

          m_skipLinesB[SAO_Y ][typeIdc]= 3;
          m_skipLinesB[SAO_Cb][typeIdc]= m_skipLinesB[SAO_Cr][typeIdc]= 1;
        }
        break;
      default:
        {
          printf("Not a supported type");
          assert(0);
          exit(-1);
        }
      }
    }
#endif    
  }

}

Void TEncSampleAdaptiveOffset::destroyEncData()
{
  if(m_pppcRDSbacCoder != NULL)
  {
    for (Int cs = 0; cs < NUM_SAO_CABACSTATE_LABELS; cs ++ )
    {
      delete m_pppcRDSbacCoder[cs];
    }
    delete[] m_pppcRDSbacCoder; m_pppcRDSbacCoder = NULL;
  }

  if(m_pppcBinCoderCABAC != NULL)
  {
    for (Int cs = 0; cs < NUM_SAO_CABACSTATE_LABELS; cs ++ )
    {
      delete m_pppcBinCoderCABAC[cs];
    }
    delete[] m_pppcBinCoderCABAC; m_pppcBinCoderCABAC = NULL;
  }

  if(m_statData != NULL)
  {
    for(Int i=0; i< m_numCTUsPic; i++)
    {
      for(Int compIdx=0; compIdx< NUM_SAO_COMPONENTS; compIdx++)
      {
        delete[] m_statData[i][compIdx];
      }
      delete[] m_statData[i];
    }
    delete[] m_statData; m_statData = NULL;
  }
#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
  if(m_preDBFstatData != NULL)
  {
    for(Int i=0; i< m_numCTUsPic; i++)
    {
      for(Int compIdx=0; compIdx< NUM_SAO_COMPONENTS; compIdx++)
      {
        delete[] m_preDBFstatData[i][compIdx];
      }
      delete[] m_preDBFstatData[i];
    }
    delete[] m_preDBFstatData; m_preDBFstatData = NULL;
  }

#endif
}

Void TEncSampleAdaptiveOffset::initRDOCabacCoder(TEncSbac* pcRDGoOnSbacCoder, TComSlice* pcSlice) 
{
  m_pcRDGoOnSbacCoder = pcRDGoOnSbacCoder;
  m_pcRDGoOnSbacCoder->setSlice(pcSlice);
  m_pcRDGoOnSbacCoder->resetEntropy();
  m_pcRDGoOnSbacCoder->resetBits();

  m_pcRDGoOnSbacCoder->store( m_pppcRDSbacCoder[SAO_CABACSTATE_PIC_INIT]);
}

Void TEncSampleAdaptiveOffset::SAOProcess(TComPic* pPic, Bool* sliceEnabled, const Double *lambdas
#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
                                         , Bool isPreDBFSamplesUsed
#endif
                                          )
{
  TComPicYuv* orgYuv= pPic->getPicYuvOrg();
  TComPicYuv* resYuv= pPic->getPicYuvRec();
  m_lambda[SAO_Y]= lambdas[0]; m_lambda[SAO_Cb]= lambdas[1]; m_lambda[SAO_Cr]= lambdas[2];
  TComPicYuv* srcYuv = m_tempPicYuv;
  resYuv->copyToPic(srcYuv);
  srcYuv->setBorderExtension(false);
  srcYuv->extendPicBorder();

  //collect statistics
  getStatistics(m_statData, orgYuv, srcYuv, pPic);
#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
  if(isPreDBFSamplesUsed)
  {
    addPreDBFStatistics(m_statData);
  }
#endif
  //slice on/off 
  decidePicParams(sliceEnabled, pPic->getSlice(0)->getDepth()); 

  //block on/off 
  SAOBlkParam* reconParams = new SAOBlkParam[m_numCTUsPic]; //temporary parameter buffer for storing reconstructed SAO parameters
  decideBlkParams(pPic, sliceEnabled, m_statData, srcYuv, resYuv, reconParams, pPic->getPicSym()->getSAOBlkParam());
  delete[] reconParams;

}

#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
Void TEncSampleAdaptiveOffset::getPreDBFStatistics(TComPic* pPic)
{
  getStatistics(m_preDBFstatData, pPic->getPicYuvOrg(), pPic->getPicYuvRec(), pPic, true);
}

Void TEncSampleAdaptiveOffset::addPreDBFStatistics(SAOStatData***blkStats)
{
  for(Int n=0; n< m_numCTUsPic; n++)
  {
    for(Int compIdx=0; compIdx < NUM_SAO_COMPONENTS; compIdx++)
    {
      for(Int typeIdc=0; typeIdc < NUM_SAO_NEW_TYPES; typeIdc++)
      {
        blkStats[n][compIdx][typeIdc] += m_preDBFstatData[n][compIdx][typeIdc];
      }
    }
  }
}

#endif

Void TEncSampleAdaptiveOffset::getStatistics(SAOStatData***blkStats, TComPicYuv* orgYuv, TComPicYuv* srcYuv, TComPic* pPic
#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
                          , Bool isCalculatePreDeblockSamples
#endif
                          )
{
  Bool isLeftAvail,isRightAvail,isAboveAvail,isBelowAvail,isAboveLeftAvail,isAboveRightAvail,isBelowLeftAvail,isBelowRightAvail;

  for(Int ctu= 0; ctu < m_numCTUsPic; ctu++)
  {
    Int yPos   = (ctu / m_numCTUInWidth)*m_maxCUHeight;
    Int xPos   = (ctu % m_numCTUInWidth)*m_maxCUWidth;
    Int height = (yPos + m_maxCUHeight > m_picHeight)?(m_picHeight- yPos):m_maxCUHeight;
    Int width  = (xPos + m_maxCUWidth  > m_picWidth )?(m_picWidth - xPos):m_maxCUWidth;

    pPic->getPicSym()->deriveLoopFilterBoundaryAvailibility(ctu, isLeftAvail,isRightAvail,isAboveAvail,isBelowAvail,isAboveLeftAvail,isAboveRightAvail,isBelowLeftAvail,isBelowRightAvail);

    //NOTE: The number of skipped lines during gathering CTU statistics depends on the slice boundary availabilities.
    //For simplicity, here only picture boundaries are considered.

    isRightAvail      = (xPos + m_maxCUWidth  < m_picWidth );
    isBelowAvail      = (yPos + m_maxCUHeight < m_picHeight);
    isBelowRightAvail = (isRightAvail && isBelowAvail);
    isBelowLeftAvail  = ((xPos > 0) && (isBelowAvail));
    isAboveRightAvail = ((yPos > 0) && (isRightAvail));

    for(Int compIdx=0; compIdx< NUM_SAO_COMPONENTS; compIdx++)
    {
      Bool isLuma     = (compIdx == SAO_Y);
      Int  formatShift= isLuma?0:1;

      Int  srcStride = isLuma?srcYuv->getStride():srcYuv->getCStride();
      Pel* srcBlk    = getPicBuf(srcYuv, compIdx)+ (yPos >> formatShift)*srcStride+ (xPos >> formatShift);

      Int  orgStride  = isLuma?orgYuv->getStride():orgYuv->getCStride();
      Pel* orgBlk     = getPicBuf(orgYuv, compIdx)+ (yPos >> formatShift)*orgStride+ (xPos >> formatShift);

      getBlkStats(compIdx, blkStats[ctu][compIdx]  
                , srcBlk, orgBlk, srcStride, orgStride, (width  >> formatShift), (height >> formatShift)
                , isLeftAvail,  isRightAvail, isAboveAvail, isBelowAvail, isAboveLeftAvail, isAboveRightAvail, isBelowLeftAvail, isBelowRightAvail
#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
                , isCalculatePreDeblockSamples
#endif
                );

    }
  }
}

Void TEncSampleAdaptiveOffset::decidePicParams(Bool* sliceEnabled, Int picTempLayer)
{
  //decide sliceEnabled[compIdx]
  for (Int compIdx=0; compIdx<NUM_SAO_COMPONENTS; compIdx++)
  {
    // reset flags & counters
    sliceEnabled[compIdx] = true;

#if SAO_ENCODING_CHOICE
#if SAO_ENCODING_CHOICE_CHROMA
    // decide slice-level on/off based on previous results
    if( (picTempLayer > 0) 
      && (m_saoDisabledRate[compIdx][picTempLayer-1] > ((compIdx==SAO_Y) ? SAO_ENCODING_RATE : SAO_ENCODING_RATE_CHROMA)) )
    {
      sliceEnabled[compIdx] = false;
    }
#else
    // decide slice-level on/off based on previous results
    if( (picTempLayer > 0) 
      && (m_saoDisabledRate[SAO_Y][0] > SAO_ENCODING_RATE) )
    {
      sliceEnabled[compIdx] = false;
    }
#endif
#endif
  }
}

Int64 TEncSampleAdaptiveOffset::getDistortion(Int ctu, Int compIdx, Int typeIdc, Int typeAuxInfo, Int* invQuantOffset, SAOStatData& statData)
{
  Int64 dist=0;
  Int inputBitDepth    = (compIdx == SAO_Y) ? g_bitDepthY : g_bitDepthC ;
  Int shift = 2 * DISTORTION_PRECISION_ADJUSTMENT(inputBitDepth-8);

  switch(typeIdc)
  {
    case SAO_TYPE_EO_0:
    case SAO_TYPE_EO_90:
    case SAO_TYPE_EO_135:
    case SAO_TYPE_EO_45:
      {
        for (Int offsetIdx=0; offsetIdx<NUM_SAO_EO_CLASSES; offsetIdx++)
        {
          dist += estSaoDist( statData.count[offsetIdx], invQuantOffset[offsetIdx], statData.diff[offsetIdx], shift);
        }        
      }
      break;
    case SAO_TYPE_BO:
      {
        for (Int offsetIdx=typeAuxInfo; offsetIdx<typeAuxInfo+4; offsetIdx++)
        {
          Int bandIdx = offsetIdx % NUM_SAO_BO_CLASSES ; 
          dist += estSaoDist( statData.count[bandIdx], invQuantOffset[bandIdx], statData.diff[bandIdx], shift);
        }
      }
      break;
    default:
      {
        printf("Not a supported type");
        assert(0);
        exit(-1);
      }
  }

  return dist;
}

inline Int64 TEncSampleAdaptiveOffset::estSaoDist(Int64 count, Int64 offset, Int64 diffSum, Int shift)
{
  return (( count*offset*offset-diffSum*offset*2 ) >> shift);
}

inline Int TEncSampleAdaptiveOffset::estIterOffset(Int typeIdx, Int classIdx, Double lambda, Int offsetInput, Int64 count, Int64 diffSum, Int shift, Int bitIncrease, Int64& bestDist, Double& bestCost, Int offsetTh )
{
  Int iterOffset, tempOffset;
  Int64 tempDist, tempRate;
  Double tempCost, tempMinCost;
  Int offsetOutput = 0;
  iterOffset = offsetInput;
  // Assuming sending quantized value 0 results in zero offset and sending the value zero needs 1 bit. entropy coder can be used to measure the exact rate here. 
  tempMinCost = lambda; 
  while (iterOffset != 0)
  {
    // Calculate the bits required for signaling the offset
    tempRate = (typeIdx == SAO_TYPE_BO) ? (abs((Int)iterOffset)+2) : (abs((Int)iterOffset)+1); 
    if (abs((Int)iterOffset)==offsetTh) //inclusive 
    {  
      tempRate --;
    }
    // Do the dequantization before distortion calculation
    tempOffset  = iterOffset << bitIncrease;
    tempDist    = estSaoDist( count, tempOffset, diffSum, shift);
    tempCost    = ((Double)tempDist + lambda * (Double) tempRate);
    if(tempCost < tempMinCost)
    {
      tempMinCost = tempCost;
      offsetOutput = iterOffset;
      bestDist = tempDist;
      bestCost = tempCost;
    }
    iterOffset = (iterOffset > 0) ? (iterOffset-1):(iterOffset+1);
  }
  return offsetOutput;
}

Void TEncSampleAdaptiveOffset::deriveOffsets(Int ctu, Int compIdx, Int typeIdc, SAOStatData& statData, Int* quantOffsets, Int& typeAuxInfo)
{
  Int bitDepth = (compIdx== SAO_Y) ? g_bitDepthY : g_bitDepthC;
  Int shift = 2 * DISTORTION_PRECISION_ADJUSTMENT(bitDepth-8);
  Int offsetTh = g_saoMaxOffsetQVal[compIdx];  //inclusive

  ::memset(quantOffsets, 0, sizeof(Int)*MAX_NUM_SAO_CLASSES);

  //derive initial offsets 
  Int numClasses = (typeIdc == SAO_TYPE_BO)?((Int)NUM_SAO_BO_CLASSES):((Int)NUM_SAO_EO_CLASSES);
  for(Int classIdx=0; classIdx< numClasses; classIdx++)
  {
    if( (typeIdc != SAO_TYPE_BO) && (classIdx==SAO_CLASS_EO_PLAIN)  ) 
    {
      continue; //offset will be zero
    }

    if(statData.count[classIdx] == 0)
    {
      continue; //offset will be zero
    }

    quantOffsets[classIdx] = (Int) xRoundIbdi(bitDepth, (Double)( statData.diff[classIdx]<<(bitDepth-8)) 
                                                                  / 
                                                          (Double)( statData.count[classIdx]<< m_offsetStepLog2[compIdx])
                                               );
    quantOffsets[classIdx] = Clip3(-offsetTh, offsetTh, quantOffsets[classIdx]);
  }

  // adjust offsets
  switch(typeIdc)
  {
    case SAO_TYPE_EO_0:
    case SAO_TYPE_EO_90:
    case SAO_TYPE_EO_135:
    case SAO_TYPE_EO_45:
      {
        Int64 classDist;
        Double classCost;
        for(Int classIdx=0; classIdx<NUM_SAO_EO_CLASSES; classIdx++)  
        {         
          if(classIdx==SAO_CLASS_EO_FULL_VALLEY && quantOffsets[classIdx] < 0) quantOffsets[classIdx] =0;
          if(classIdx==SAO_CLASS_EO_HALF_VALLEY && quantOffsets[classIdx] < 0) quantOffsets[classIdx] =0;
          if(classIdx==SAO_CLASS_EO_HALF_PEAK   && quantOffsets[classIdx] > 0) quantOffsets[classIdx] =0;
          if(classIdx==SAO_CLASS_EO_FULL_PEAK   && quantOffsets[classIdx] > 0) quantOffsets[classIdx] =0;

          if( quantOffsets[classIdx] != 0 ) //iterative adjustment only when derived offset is not zero
          {
            quantOffsets[classIdx] = estIterOffset( typeIdc, classIdx, m_lambda[compIdx], quantOffsets[classIdx], statData.count[classIdx], statData.diff[classIdx], shift, m_offsetStepLog2[compIdx], classDist , classCost , offsetTh );
          }
        }
      
        typeAuxInfo =0;
      }
      break;
    case SAO_TYPE_BO:
      {
        Int64  distBOClasses[NUM_SAO_BO_CLASSES];
        Double costBOClasses[NUM_SAO_BO_CLASSES];
        ::memset(distBOClasses, 0, sizeof(Int64)*NUM_SAO_BO_CLASSES);
        for(Int classIdx=0; classIdx< NUM_SAO_BO_CLASSES; classIdx++)
        {         
          costBOClasses[classIdx]= m_lambda[compIdx];
          if( quantOffsets[classIdx] != 0 ) //iterative adjustment only when derived offset is not zero
          {
            quantOffsets[classIdx] = estIterOffset( typeIdc, classIdx, m_lambda[compIdx], quantOffsets[classIdx], statData.count[classIdx], statData.diff[classIdx], shift, m_offsetStepLog2[compIdx], distBOClasses[classIdx], costBOClasses[classIdx], offsetTh );
          }
        }

        //decide the starting band index
        Double minCost = MAX_DOUBLE, cost;
        for(Int band=0; band< NUM_SAO_BO_CLASSES- 4+ 1; band++) 
        {
          cost  = costBOClasses[band  ];
          cost += costBOClasses[band+1];
          cost += costBOClasses[band+2];
          cost += costBOClasses[band+3];

          if(cost < minCost)
          {
            minCost = cost;
            typeAuxInfo = band;
          }
        }
        //clear those unused classes
        Int clearQuantOffset[NUM_SAO_BO_CLASSES];
        ::memset(clearQuantOffset, 0, sizeof(Int)*NUM_SAO_BO_CLASSES);
        for(Int i=0; i< 4; i++) 
        {
          Int band = (typeAuxInfo+i)%NUM_SAO_BO_CLASSES;
          clearQuantOffset[band] = quantOffsets[band];
        }
        ::memcpy(quantOffsets, clearQuantOffset, sizeof(Int)*NUM_SAO_BO_CLASSES);        
      }
      break;
    default:
      {
        printf("Not a supported type");
        assert(0);
        exit(-1);
      }

  }

}

Void TEncSampleAdaptiveOffset::deriveModeNewRDO(Int ctu, std::vector<SAOBlkParam*>& mergeList, Bool* sliceEnabled, SAOStatData***blkStats, SAOBlkParam& modeParam, Double& modeNormCost, TEncSbac**cabacCoderRDO, Int inCabacLabel)
{
  Double minCost, cost;
  Int rate;
  UInt previousWrittenBits;
  Int64 dist[NUM_SAO_COMPONENTS], modeDist[NUM_SAO_COMPONENTS];
  SAOOffset testOffset[NUM_SAO_COMPONENTS];
  Int compIdx;
  Int invQuantOffset[MAX_NUM_SAO_CLASSES];

  modeDist[SAO_Y]= modeDist[SAO_Cb] = modeDist[SAO_Cr] = 0;

  //pre-encode merge flags
  modeParam[SAO_Y ].modeIdc = SAO_MODE_OFF;
  m_pcRDGoOnSbacCoder->load(cabacCoderRDO[inCabacLabel]);
  m_pcRDGoOnSbacCoder->codeSAOBlkParam(modeParam, sliceEnabled, (mergeList[SAO_MERGE_LEFT]!= NULL), (mergeList[SAO_MERGE_ABOVE]!= NULL), true);
  m_pcRDGoOnSbacCoder->store(cabacCoderRDO[SAO_CABACSTATE_BLK_MID]);

  //------ luma --------//
  compIdx = SAO_Y;
  //"off" case as initial cost
  modeParam[compIdx].modeIdc = SAO_MODE_OFF;
  m_pcRDGoOnSbacCoder->resetBits();
  m_pcRDGoOnSbacCoder->codeSAOOffsetParam(compIdx, modeParam[compIdx], sliceEnabled[compIdx]);
  modeDist[compIdx] = 0;
  minCost= m_lambda[compIdx]*((Double)m_pcRDGoOnSbacCoder->getNumberOfWrittenBits());
  m_pcRDGoOnSbacCoder->store(cabacCoderRDO[SAO_CABACSTATE_BLK_TEMP]);
  if(sliceEnabled[compIdx])
  {
    for(Int typeIdc=0; typeIdc< NUM_SAO_NEW_TYPES; typeIdc++)
    {
      testOffset[compIdx].modeIdc = SAO_MODE_NEW;
      testOffset[compIdx].typeIdc = typeIdc;

      //derive coded offset
      deriveOffsets(ctu, compIdx, typeIdc, blkStats[ctu][compIdx][typeIdc], testOffset[compIdx].offset, testOffset[compIdx].typeAuxInfo);

      //inversed quantized offsets
      invertQuantOffsets(compIdx, typeIdc, testOffset[compIdx].typeAuxInfo, invQuantOffset, testOffset[compIdx].offset);

      //get distortion
      dist[compIdx] = getDistortion(ctu, compIdx, testOffset[compIdx].typeIdc, testOffset[compIdx].typeAuxInfo, invQuantOffset, blkStats[ctu][compIdx][typeIdc]);

      //get rate
      m_pcRDGoOnSbacCoder->load(cabacCoderRDO[SAO_CABACSTATE_BLK_MID]);
      m_pcRDGoOnSbacCoder->resetBits();
      m_pcRDGoOnSbacCoder->codeSAOOffsetParam(compIdx, testOffset[compIdx], sliceEnabled[compIdx]);
      rate = m_pcRDGoOnSbacCoder->getNumberOfWrittenBits();
      cost = (Double)dist[compIdx] + m_lambda[compIdx]*((Double)rate);
      if(cost < minCost)
      {
        minCost = cost;
        modeDist[compIdx] = dist[compIdx];
        modeParam[compIdx]= testOffset[compIdx];
        m_pcRDGoOnSbacCoder->store(cabacCoderRDO[SAO_CABACSTATE_BLK_TEMP]);
      }
    }
  }
  m_pcRDGoOnSbacCoder->load(cabacCoderRDO[SAO_CABACSTATE_BLK_TEMP]);
  m_pcRDGoOnSbacCoder->store(cabacCoderRDO[SAO_CABACSTATE_BLK_MID]);

  //------ chroma --------//
  //"off" case as initial cost
  cost = 0;
  previousWrittenBits = 0;
  m_pcRDGoOnSbacCoder->resetBits();
  for (Int component = SAO_Cb; component < NUM_SAO_COMPONENTS; component++)
  {
    modeParam[component].modeIdc = SAO_MODE_OFF; 
    modeDist [component] = 0;

    m_pcRDGoOnSbacCoder->codeSAOOffsetParam(component, modeParam[component], sliceEnabled[component]);

    const UInt currentWrittenBits = m_pcRDGoOnSbacCoder->getNumberOfWrittenBits();
    cost += m_lambda[component] * (currentWrittenBits - previousWrittenBits);
    previousWrittenBits = currentWrittenBits;
  }

  minCost = cost;

  //doesn"t need to store cabac status here since the whole CTU parameters will be re-encoded at the end of this function

  for(Int typeIdc=0; typeIdc< NUM_SAO_NEW_TYPES; typeIdc++)
  {
    m_pcRDGoOnSbacCoder->load(cabacCoderRDO[SAO_CABACSTATE_BLK_MID]);
    m_pcRDGoOnSbacCoder->resetBits();
    previousWrittenBits = 0;
    cost = 0;

    for(compIdx= SAO_Cb; compIdx< NUM_SAO_COMPONENTS; compIdx++)
    {
      if(!sliceEnabled[compIdx])
      {
        testOffset[compIdx].modeIdc = SAO_MODE_OFF;
        dist[compIdx]= 0;
        continue;
      }
      testOffset[compIdx].modeIdc = SAO_MODE_NEW;
      testOffset[compIdx].typeIdc = typeIdc;

      //derive offset & get distortion
      deriveOffsets(ctu, compIdx, typeIdc, blkStats[ctu][compIdx][typeIdc], testOffset[compIdx].offset, testOffset[compIdx].typeAuxInfo);
      invertQuantOffsets(compIdx, typeIdc, testOffset[compIdx].typeAuxInfo, invQuantOffset, testOffset[compIdx].offset);
      dist[compIdx]= getDistortion(ctu, compIdx, typeIdc, testOffset[compIdx].typeAuxInfo, invQuantOffset, blkStats[ctu][compIdx][typeIdc]);
      
      m_pcRDGoOnSbacCoder->codeSAOOffsetParam(compIdx, testOffset[compIdx], sliceEnabled[compIdx]);

      const UInt currentWrittenBits = m_pcRDGoOnSbacCoder->getNumberOfWrittenBits();
      cost += dist[compIdx] + (m_lambda[compIdx] * (currentWrittenBits - previousWrittenBits));
      previousWrittenBits = currentWrittenBits;
    }

    if(cost < minCost)
    {
      minCost = cost;
      for(compIdx= SAO_Cb; compIdx< NUM_SAO_COMPONENTS; compIdx++)
      {
        modeDist [compIdx] = dist      [compIdx];
        modeParam[compIdx] = testOffset[compIdx];
      }
    }
  }

  //----- re-gen rate & normalized cost----//
  modeNormCost = 0;
  for(UInt component = SAO_Y; component < NUM_SAO_COMPONENTS; component++)
  {
    modeNormCost += (Double)modeDist[component] / m_lambda[component];
  }
  m_pcRDGoOnSbacCoder->load(cabacCoderRDO[inCabacLabel]);
  m_pcRDGoOnSbacCoder->resetBits();
  m_pcRDGoOnSbacCoder->codeSAOBlkParam(modeParam, sliceEnabled, (mergeList[SAO_MERGE_LEFT]!= NULL), (mergeList[SAO_MERGE_ABOVE]!= NULL), false);
  modeNormCost += (Double)m_pcRDGoOnSbacCoder->getNumberOfWrittenBits();

}

Void TEncSampleAdaptiveOffset::deriveModeMergeRDO(Int ctu, std::vector<SAOBlkParam*>& mergeList, Bool* sliceEnabled, SAOStatData***blkStats, SAOBlkParam& modeParam, Double& modeNormCost, TEncSbac**cabacCoderRDO, Int inCabacLabel)
{
  Int mergeListSize = (Int)mergeList.size();
  modeNormCost = MAX_DOUBLE;

  Double cost;
  SAOBlkParam testBlkParam;

  for(Int mergeType=0; mergeType< mergeListSize; mergeType++)
  {
    if(mergeList[mergeType] == NULL)
    {
      continue;
    }

    testBlkParam = *(mergeList[mergeType]);
    //normalized distortion
    Double normDist=0;
    for(Int compIdx=0; compIdx< NUM_SAO_COMPONENTS; compIdx++)
    {
      testBlkParam[compIdx].modeIdc = SAO_MODE_MERGE;
      testBlkParam[compIdx].typeIdc = mergeType;

      SAOOffset& mergedOffsetParam = (*(mergeList[mergeType]))[compIdx];

      if( mergedOffsetParam.modeIdc != SAO_MODE_OFF)
      {
        //offsets have been reconstructed. Don"t call inversed quantization function.
        normDist += (((Double)getDistortion(ctu, compIdx, mergedOffsetParam.typeIdc, mergedOffsetParam.typeAuxInfo, mergedOffsetParam.offset, blkStats[ctu][compIdx][mergedOffsetParam.typeIdc]))
                       /m_lambda[compIdx]
                    );
      }

    }

    //rate
    m_pcRDGoOnSbacCoder->load(cabacCoderRDO[inCabacLabel]);
    m_pcRDGoOnSbacCoder->resetBits();
    m_pcRDGoOnSbacCoder->codeSAOBlkParam(testBlkParam, sliceEnabled, (mergeList[SAO_MERGE_LEFT]!= NULL), (mergeList[SAO_MERGE_ABOVE]!= NULL), false);
    Int rate = m_pcRDGoOnSbacCoder->getNumberOfWrittenBits();

    cost = normDist+(Double)rate;

    if(cost < modeNormCost)
    {
      modeNormCost = cost;
      modeParam    = testBlkParam;
      m_pcRDGoOnSbacCoder->store(cabacCoderRDO[SAO_CABACSTATE_BLK_TEMP]);
    }
  }

  m_pcRDGoOnSbacCoder->load(cabacCoderRDO[SAO_CABACSTATE_BLK_TEMP]);

}

Void TEncSampleAdaptiveOffset::decideBlkParams(TComPic* pic, Bool* sliceEnabled, SAOStatData***blkStats, TComPicYuv* srcYuv, TComPicYuv* resYuv, SAOBlkParam* reconParams, SAOBlkParam* codedParams)
{
  Bool isAllBlksDisabled = false;
  if(!sliceEnabled[SAO_Y] && !sliceEnabled[SAO_Cb] && !sliceEnabled[SAO_Cr])
  {
    isAllBlksDisabled = true;
  }

  m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[ SAO_CABACSTATE_PIC_INIT ]);

  SAOBlkParam modeParam;
  Double minCost, modeCost;

  for(Int ctu=0; ctu< m_numCTUsPic; ctu++)
  {
    if(isAllBlksDisabled)
    {
      codedParams[ctu].reset();
      continue;
    }

    m_pcRDGoOnSbacCoder->store(m_pppcRDSbacCoder[ SAO_CABACSTATE_BLK_CUR ]);

    //get merge list
    std::vector<SAOBlkParam*> mergeList;
    getMergeList(pic, ctu, reconParams, mergeList);

    minCost = MAX_DOUBLE;
    for(Int mode=0; mode < NUM_SAO_MODES; mode++)
    {
      switch(mode)
      {
      case SAO_MODE_OFF:
        {
          continue; //not necessary, since all-off case will be tested in SAO_MODE_NEW case.
        }
        break;
      case SAO_MODE_NEW:
        {
          deriveModeNewRDO(ctu, mergeList, sliceEnabled, blkStats, modeParam, modeCost, m_pppcRDSbacCoder, SAO_CABACSTATE_BLK_CUR);

        }
        break;
      case SAO_MODE_MERGE:
        {
          deriveModeMergeRDO(ctu, mergeList, sliceEnabled, blkStats , modeParam, modeCost, m_pppcRDSbacCoder, SAO_CABACSTATE_BLK_CUR);
        }
        break;
      default:
        {
          printf("Not a supported SAO mode
");
          assert(0);
          exit(-1);
        }
      }

      if(modeCost < minCost)
      {
        minCost = modeCost;
        codedParams[ctu] = modeParam;
        m_pcRDGoOnSbacCoder->store(m_pppcRDSbacCoder[ SAO_CABACSTATE_BLK_NEXT ]);

      }
    } //mode
    m_pcRDGoOnSbacCoder->load(m_pppcRDSbacCoder[ SAO_CABACSTATE_BLK_NEXT ]);

    //apply reconstructed offsets
    reconParams[ctu] = codedParams[ctu];
    reconstructBlkSAOParam(reconParams[ctu], mergeList);
    offsetCTU(ctu, srcYuv, resYuv, reconParams[ctu], pic);
  } //ctu

#if SAO_ENCODING_CHOICE 
  Int picTempLayer = pic->getSlice(0)->getDepth();
  Int numLcusForSAOOff[NUM_SAO_COMPONENTS];
  numLcusForSAOOff[SAO_Y ] = numLcusForSAOOff[SAO_Cb]= numLcusForSAOOff[SAO_Cr]= 0;

  for (Int compIdx=0; compIdx<NUM_SAO_COMPONENTS; compIdx++)
  {
    for(Int ctu=0; ctu< m_numCTUsPic; ctu++)
    {
      if( reconParams[ctu][compIdx].modeIdc == SAO_MODE_OFF)
      {
        numLcusForSAOOff[compIdx]++;
      }
    }
  }
#if SAO_ENCODING_CHOICE_CHROMA
  for (Int compIdx=0; compIdx<NUM_SAO_COMPONENTS; compIdx++)
  {
    m_saoDisabledRate[compIdx][picTempLayer] = (Double)numLcusForSAOOff[compIdx]/(Double)m_numCTUsPic;
  }
#else
  if (picTempLayer == 0)
  {
    m_saoDisabledRate[SAO_Y][0] = (Double)(numLcusForSAOOff[SAO_Y]+numLcusForSAOOff[SAO_Cb]+numLcusForSAOOff[SAO_Cr])/(Double)(m_numCTUsPic*3);
  }
#endif                                              
#endif
}

Void TEncSampleAdaptiveOffset::getBlkStats(Int compIdx, SAOStatData* statsDataTypes  
                        , Pel* srcBlk, Pel* orgBlk, Int srcStride, Int orgStride, Int width, Int height
                        , Bool isLeftAvail,  Bool isRightAvail, Bool isAboveAvail, Bool isBelowAvail, Bool isAboveLeftAvail, Bool isAboveRightAvail, Bool isBelowLeftAvail, Bool isBelowRightAvail
#if SAO_ENCODE_ALLOW_USE_PREDEBLOCK
                        , Bool isCalculatePreDeblockSamples
#endif
                        )
{
  if(m_lineBufWidth != m_maxCUWidth)
  {
    m_lineBufWidth = m_maxCUWidth;

    if (m_signLineBuf1) delete[] m_signLineBuf1; m_signLineBuf1 = NULL;
    m_signLineBuf1 = new Char[m_lineBufWidth+1]; 

    if (m_signLineBuf2) delete[] m_signLineBuf2; m_signLineBuf2 = NULL;
    m_signLineBuf2 = new Char[m_lineBufWidth+1];
  }

  Int x,y, startX, startY, endX, endY, edgeType, firstLineStartX, firstLineEndX;
  Char signLeft, signRight, signDown;
  Int64 *diff, *count;
  Pel *srcLine, *orgLine;
  Int* skipLinesR = m_skipLinesR[