Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

요약 자동 평가 측정항목

요약 자동 평가 (자동 평가)는 생성형 AI를 사용하여 정확성, 완전성, 준수를 기반으로 AI 생성 요약의 품질을 평가합니다.

경우에 따라 준수 및 완전성 점수가 해당 사항 없음 으로 표시됩니다.

준수는 맞춤 섹션을 사용하는 요약만 평가합니다. 요약이 사전 빌드된 섹션을 사용하는 경우 점수는 해당 사항 없음 입니다.
완전성은 자유 형식 텍스트가 포함된 비범주형 요약만 평가합니다. 요약이 범주형 값을 사용하는 경우 점수는 해당 사항 없음 입니다.

정확성

정확성은 요약이 대화 트랜스크립트의 사실적 세부정보와 얼마나 일치하는지 측정합니다. 각 요약에 대해 자동 평가는 해당 근거와 함께 정확성 비율을 결정합니다. 정확성 점수가 낮으면 요약에 사실적 문제가 있음을 의미합니다.

정확성 결과는 다음과 같이 표시됩니다.

{
  "decomposition": [
    {
        "point": "The customer wants to cancel their subscription.",
        "accuracy": "This is accurate. The customer calls to get support of cancelling their subscription.",
        "is_accurate": true
    },
    {
        "point": "The customer asks about a $30 credit.",
        "accuracy": "This is inaccurate. The customer mentioned $10.",
        "is_accurate": false
    }
  ]
}

이전 예의 각 point는 요약의 분해된 부분입니다. 바이너리 매개변수 is_accurate는 정확성 평가 결과를 표시합니다. accuracy 매개변수는 근거를 제공합니다.

준수

요약 자동 평가는 제공된 요약에 일련의 질문을 적용합니다. 자동 평가는 이러한 질문과 대화 트랜스크립트를 사용하여 각 명령어에 대한 요약의 준수 여부를 평가합니다. 그러나 요약 자동 평가는 문법적 명령어를 정확하게 확인할 수 없는 Gemini에 의존합니다. 따라서 요약 자동 평가는 요약이 문법적 명령어를 준수하는지 정확하게 평가하지 못할 수 있습니다.

준수 점수가 낮으면 요약이 요약 섹션 정의에 제공된 명령어를 준수하지 못함을 의미합니다. 맞춤 섹션을 사용한 요약만 준수 점수를 생성할 수 있습니다.

준수의 경우 요약 자동 평가는 다음과 같은 두 가지 유형의 요약 작업을 인식합니다.

범주형 요약: 명령어에 정의된 범주형 값을 제공합니다. 예를 들어 명령어는 Sunny 또는 Cloudy 응답을 요청합니다. 자동 평가는 요약이 설명 텍스트 없이 Sunny 또는 Cloudy 만 제공하는지 확인합니다.
비범주형 요약: 자유 형식 텍스트를 제공합니다. 자동 평가는 비범주형 요약이 작업 설명에 정의된 명령어를 따르는지 확인합니다.

준수 결과는 다음과 같이 표시됩니다.

(Categorical):
{
  "rubrics": [
    "question": "Does the summary follow the instruction and return only one of the allowed categorical values?",
    "reasoning": "The summary is not a categorical value. It contains descriptive text instead of providing only one of the allowed categorical values.",
    "is_addressed": "False"
  ]
}
(Noncategorical):
{
  "rubrics": [
    {
      "question": "Does the summary follow the instruction 'State the product name being returned'?",
      "reasoning": "Summary followed instruction. It correctly stated the product name, for example: 'return the \\'Stealth Bomber X5\\' gaming mouse'.",
      "is_addressed": "True"
    }
  ]
}

각 질문은 제공된 요약 섹션 정의에서 파생됩니다. 바이너리 매개변수 is_addressed는 준수 평가 결과를 표시합니다. reasoning 매개변수는 근거를 제공합니다.
질문이 목표와 일치하지 않으면 해당 목표의 요약 섹션 정의가 명확하지 않은 것입니다. 문제를 파악하고 섹션 정의를 개선할 수 있습니다.

완전성

AI 생성 요약의 섹션 정의에 있는 명령어를 기반으로 요약 자동 평가는 루브릭을 적용하여 요약 완전성을 평가합니다. 점수가 낮으면 요약에 트랜스크립트의 중요한 정보가 누락되었음을 의미합니다.

다음은 완전성 결과의 예입니다.

[
  {
        'question': "Does the summary follow 'Describe the specific actions the agent took to assist the customer with their issue  or request'?",
        'content_list': [
    {
        'transcript_content': 'The agent provided the customer with the arrival window for the ABC appointment.',
        'related_content_from_summary': 'The agent, Robyn, provided the customer with the arrival window for the ABC appointment, which is from 01:30 PM to 2:45 PM.',
        'is_covered': 'True'
    },
    {
        'transcript_content': 'The agent clarified that the arrival window information is sent via text message.',
        'related_content_from_summary': 'The agent also clarified that the arrival window information is sent via text message',
        'is_covered': 'True'
    },
    {
        'transcript_content': "The agent confirmed the phone number is 123-456-7890.",
        'related_content_from_summary': "and confirmed the phone number is 123-456-7890.",
        'is_covered': 'True'
    } ]
  },
  {
        'question': "Does the summary follow 'Identify any dates explicitly mentioned by the agent or the customer'?",
        'content_list': [
    {
        'transcript_content': 'The ABC appointment is on June 2nd.',
        'related_content_from_summary': '',
        'is_covered': 'False'
    } ]
  },
  {
        'question': "Does the summary follow 'Identify the brand and any relevant specifications mentioned in the conversation'?",
        'content_list': [
    {
        'transcript_content': 'The appointment is for a Google Pixel.',
        'related_content_from_summary': '',
        'is_covered': 'False'
    } ]
  },
  {
        'question': "Does the summary follow 'Describe any updates the agent made, such as price, address, or order updates'?",
        'content_list': []
  },
  {
        'question': "Does the summary follow 'Extract the customer's order number and include it in the summary'?",
        'content_list': []
  }
]

이전 예에서는 다음과 같은 시나리오를 보여줍니다.

요약이 트랜스크립트의 관련 콘텐츠를 다루는 경우 바이너리 매개변수 is_covered가 True로 설정됩니다.
요약이 트랜스크립트의 관련 콘텐츠를 다루지 않는 경우 related_content_from_summary 매개변수는 요약이 관련 포인트를 추출하지 않았음을 나타내는 빈 문자열로 구성됩니다. 이렇게 하면 최종 점수와 해당 루브릭의 완전성 점수가 낮아집니다. 또한 바이너리 매개변수 is_covered가 False로 설정됩니다.
트랜스크립트의 질문과 관련된 콘텐츠가 없는 경우 content_list 매개변수는 요약에 페널티를 부과하지 않는 빈 목록으로 구성됩니다. 최종 집계 점수에는 이 사례가 포함되지 않습니다.

예의 각 질문은 제공된 작업 설명에서 파생됩니다. 트랜스크립트의 관련 정보는 transcript_content 매개변수의 값으로 포함됩니다. 바이너리 매개변수 is_covered는 이 특정 포인트의 완전성 결과를 표시하고 related_content_from_summary는 증거를 표시합니다. 질문이 목표와 일치하지 않으면 요약의 섹션 정의가 명확하지 않은 것입니다. 문제를 파악하고 섹션 정의를 개선하세요.

요약 자동 평가 측정항목 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

정확성

준수

완전성

요약 자동 평가 측정항목