原創(chuàng)于2009年08月02日,2009年10月22日遷移至此。
Oracle 分析函數(shù)——數(shù)據(jù)分布函數(shù)及 報(bào)表 函數(shù)
CUME_DIST
功能描述:計(jì)算一行在組中的相對(duì)位置, CUME_DIST 總是返回大于 0 、小于或等于 1 的數(shù),該數(shù)表示該行在 N 行中的位置。例如,在一個(gè) 3 行的組中,返回的累計(jì)分布值為 1/3 、 2/3 、 3/3
SAMPLE :下例中計(jì)算每個(gè)部門(mén)的員工按薪水排序依次累積出現(xiàn)的分布百分比
SELECT
department_id,
first_name||' '||last_name employee_name,
salary,
CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) AS cume_dist
FROM employees
NTILE
功能描述:將一個(gè)組分為 " 表達(dá)式 " 的散列表示,例如,如果表達(dá)式 =4 ,則給組中的每一行分配一個(gè)數(shù)(從 1 到 4 ),如果組中有 20 行,則給前 5 行分配 1 ,給下 5 行分配 2 等等。如果組的基數(shù)不能由表達(dá)式值平均分開(kāi),則對(duì)這些行進(jìn)行分配時(shí),組中就沒(méi)有任何 percentile 的行數(shù)比其它 percentile 的行數(shù)超過(guò)一行,最低的 percentile 是那些擁有額外行的 percentile 。例如,若表達(dá)式 =4 ,行數(shù) =21 ,則 percentile=1 的有 5 行, percentile=2 的有 5 行等等。
SAMPLE :下例中把 6 行數(shù)據(jù)分為 4 份
SELECT
department_id,
first_name||' '||last_name employee_name,
salary,
NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) AS quartile
FROM employees
PERCENT_RANK
功能描述:和 CUME_DIST (累積分配)函數(shù)類(lèi)似,對(duì)于一個(gè)組中給定的行來(lái)說(shuō),在計(jì)算那行的序號(hào)時(shí),先減 1 ,然后除以 n-1 ( n 為組中所有的行數(shù))。該函數(shù)總是返回 0 ~ 1 (包括 1 )之間的數(shù)。
SAMPLE :下例中如果 Khoo 的 salary 為 2900 ,則 pr 值為 0.6 ,因?yàn)? RANK 函數(shù)對(duì)于等值的返回序列值是一樣的
SELECT
department_id,
first_name||' '||last_name employee_name,
salary,
PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS pr
FROM employees
ORDER BY department_id,salary;
PERCENTILE_DISC
功能描述:返回一個(gè)與輸入的分布百分比值相對(duì)應(yīng)的數(shù)據(jù)值,分布百分比的計(jì)算方法見(jiàn)函數(shù) CUME_DIST ,如果沒(méi)有正好對(duì)應(yīng)的數(shù)據(jù)值,就取大于該分布值的下一個(gè)值。
注意:本函數(shù)與 PERCENTILE_CONT 的區(qū)別在找不到對(duì)應(yīng)的分布值時(shí)返回的替代值的計(jì)算方法不同
SAMPLE :下例中 0.7 的分布值在部門(mén) 30 中沒(méi)有對(duì)應(yīng)的 Cume_Dist 值,所以就取下一個(gè)分布值 0.83333333 所對(duì)應(yīng)的 SALARY 來(lái)替代
SELECT
department_id,
first_name||' '||last_name employee_name,
salary,
PERCENTILE_DISC(0.7) WITHIN GROUP (ORDER BY salary ) OVER (PARTITION BY department_id) "Percentile_Disc",
CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) "Cume_Dist"
FROM employees<!-- [if gte vml 1]><v:shape id="_x0000_i1028" type="#_x0000_t75" style='width:400.5pt;height:270pt'><v:imagedata src="file:///C:/Users/bq_wang/AppData/Local/Temp/msohtmlclip1/01/clip_image007.png" o:title=""/></v:shape><![endif]--><!-- [if !vml]--><!-- [endif]-->
PERCENTILE_CONT
功能描述:返回一個(gè)與輸入的分布百分比值相對(duì)應(yīng)的數(shù)據(jù)值,分布百分比的計(jì)算方法見(jiàn)函數(shù) PERCENT_RANK ,如果沒(méi)有正好對(duì)應(yīng)的數(shù)據(jù)值,就通過(guò)下面算法來(lái)得到值:
RN = 1+ (P*(N-1)) 其中 P 是輸入的分布百分比值, N 是組內(nèi)的行數(shù)
CRN = CEIL(RN) FRN = FLOOR(RN)
if (CRN = FRN = RN) then
(value of expression from row at RN)
else
(CRN - RN) * (value of expression for row at FRN) +
(RN - FRN) * (value of expression for row at CRN)
注意:本函數(shù)與 PERCENTILE_DISC 的區(qū)別在找不到對(duì)應(yīng)的分布值時(shí)返回的替代值的計(jì)算方法不同
算法太復(fù)雜,看不懂了 L
SAMPLE :在下例中,對(duì)于部門(mén) 60 的 Percentile_Cont 值計(jì)算如下:
P=0.7 N=5 RN =1+ (P*(N-1)=1+(0.7*(5-1))=3.8 CRN = CEIL(3.8)=4
FRN = FLOOR(3.8)=3
( 4 - 3.8 ) * 4800 + (3.8 - 3) * 6000 = 5760
SELECT
department_id,
first_name||' '||last_name employee_name,
salary,
PERCENTILE_DISC(0.7) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) "Percentile_Disc",
PERCENTILE_CONT(0.7) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) "Percentile_Cont",
PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) "Percent_Rank"
FROM employees
總案例
SELECT
department_id,
first_name||' '||last_name employee_name,
salary,
CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) AS cume_dist, -- 數(shù)據(jù)分布百分比
NTILE(4) OVER (PARTITION BY department_id ORDER BY salary) AS quartile,-- 數(shù)據(jù)分布,以 NTILE 中的 exp 來(lái)計(jì)算
PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS pr,-- 數(shù)據(jù)分布百分比,從 0 開(kāi)始計(jì)
PERCENTILE_DISC(0.7) WITHIN GROUP (ORDER BY salary ) OVER (PARTITION BY department_id) "Percentile_Disc",-- 輸入的分布百分比值相對(duì)應(yīng)的數(shù)據(jù)值
PERCENTILE_CONT(0.7) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) "Percentile_Cont"-- 表達(dá)式太復(fù)雜了, ...
FROM employees
RATIO_TO_REPORT
功能描述:該函數(shù)計(jì)算 expression/(sum(expression)) 的值,它給出相對(duì)于總數(shù)的百分比,即當(dāng)前行對(duì) sum(expression) 的貢獻(xiàn)。
SAMPLE :下例計(jì)算每個(gè)員工的工資占該類(lèi)員工總工資的百分比
SELECT
department_id,
first_name||' '||last_name employee_name,
salary,
RATIO_TO_REPORT(salary) OVER () AS rr
FROM employees
WHERE job_id = 'PU_CLERK';
REGR_ (Linear Regression) Functions
功能描述:這些線性回歸函數(shù)適合最小二乘法回歸線,有 9 個(gè)不同的回歸函數(shù)可使用。
REGR_SLOPE :返回斜率,等于 COVAR_POP(expr1, expr2) / VAR_POP(expr2)
REGR_INTERCEPT :返回回歸線的 y 截距,等于
AVG(expr1) - REGR_SLOPE(expr1, expr2) * AVG(expr2)
REGR_COUNT :返回用于填充回歸線的非空數(shù)字對(duì)的數(shù)目
REGR_R2 :返回回歸線的決定系數(shù),計(jì)算式為:
If VAR_POP(expr2) = 0 then return NULL
If VAR_POP(expr1) = 0 and VAR_POP(expr2) != 0 then return 1
If VAR_POP(expr1) > 0 and VAR_POP(expr2 != 0 then
return POWER(CORR(expr1,expr),2)
REGR_AVGX :計(jì)算回歸線的自變量 (expr2) 的平均值,去掉了空對(duì) (expr1, expr2) 后,等于 AVG(expr2)
REGR_AVGY :計(jì)算回歸線的應(yīng)變量 (expr1) 的平均值,去掉了空對(duì) (expr1, expr2) 后,等于 AVG(expr1)
REGR_SXX : 返回值等于 REGR_COUNT(expr1, expr2) * VAR_POP(expr2)
REGR_SYY : 返回值等于 REGR_COUNT(expr1, expr2) * VAR_POP(expr1)
REGR_SXY: 返回值等于 REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2)
(下面的例子都是在 SH 用戶下完成的)
SAMPLE 1 :下例計(jì)算 1998 年最后三個(gè)星期中兩種產(chǎn)品( 260 和 270 )在周末的銷(xiāo)售量中已開(kāi)發(fā)票數(shù)量和總數(shù)量的累積斜率和回歸線的截距
SELECT t.fiscal_month_number "Month", t.day_number_in_month "Day",
REGR_SLOPE(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_SLOPE,
REGR_INTERCEPT(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month) AS CUM_ICPT
FROM sales s, times t
WHERE s.time_id = t.time_id
AND s.prod_id IN (270, 260)
AND t.fiscal_year=1998
AND t.fiscal_week_number IN (50, 51, 52)
AND t.day_number_in_week IN (6,7)
ORDER BY t.fiscal_month_desc, t.day_number_in_month;
SAMPLE 2 :下例計(jì)算 1998 年 4 月每天的累積交易數(shù)量
SELECT UNIQUE t.day_number_in_month,
REGR_COUNT(s.amount_sold, s.quantity_sold)
OVER (PARTITION BY t.fiscal_month_number ORDER BY t.day_number_in_month)
"Regr_Count"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.fiscal_year = 1998 AND t.fiscal_month_number = 4;
SAMPLE 3 :下例計(jì)算 1998 年每月銷(xiāo)售量中已開(kāi)發(fā)票數(shù)量和總數(shù)量的累積回歸線決定系數(shù)
SELECT t.fiscal_month_number,
REGR_R2(SUM(s.amount_sold), SUM(s.quantity_sold))
OVER (ORDER BY t.fiscal_month_number) "Regr_R2"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND t.fiscal_year = 1998
GROUP BY t.fiscal_month_number
ORDER BY t.fiscal_month_number;
SAMPLE 4 :下例計(jì)算 1998 年 12 月最后兩周產(chǎn)品 260 的銷(xiāo)售量中已開(kāi)發(fā)票數(shù)量和總數(shù)量的累積平均值
SELECT t.day_number_in_month,
REGR_AVGY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month)
"Regr_AvgY",
REGR_AVGX(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_month_desc, t.day_number_in_month)
"Regr_AvgX"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND s.prod_id = 260
AND t.fiscal_month_desc = '1998-12'
AND t.fiscal_week_number IN (51, 52)
ORDER BY t.day_number_in_month;
SAMPLE 5 :下例計(jì)算產(chǎn)品 260 和 270 在 1998 年 2 月周末銷(xiāo)售量中已開(kāi)發(fā)票數(shù)量和總數(shù)量的累積 REGR_SXY, REGR_SXX, and REGR_SYY 統(tǒng)計(jì)值
SELECT t.day_number_in_month,
REGR_SXY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxy",
REGR_SYY(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_syy",
REGR_SXX(s.amount_sold, s.quantity_sold)
OVER (ORDER BY t.fiscal_year, t.fiscal_month_desc) "Regr_sxx"
FROM sales s, times t
WHERE s.time_id = t.time_id
AND prod_id IN (270, 260)
AND t.fiscal_month_desc = '1998-02'
AND t.day_number_in_week IN (6,7)
ORDER BY t.day_number_in_month;
更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主
微信掃碼或搜索:z360901061

微信掃一掃加我為好友
QQ號(hào)聯(lián)系: 360901061
您的支持是博主寫(xiě)作最大的動(dòng)力,如果您喜歡我的文章,感覺(jué)我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對(duì)您有幫助就好】元
