星期一, 十月 26, 2009
sed对同一文件替换导致文件清空的原因分析
做文件替换的时候我们经常想仅用一个文件,直接替换内容,但是这个在sed里面是不可以的。
sed 's/old/new/g' filename >filename (WRONG)
因为在执行前遇到> 重定向符号,unix先把文件清空,于是没有内容。
用sed替换同一文件,必须用一个临时文件。
sed 's/old/new/g' filename >filename_tmp
mv filename_tmp filename (CORRECT)
参考:
When you use redirection, the shell truncates the file before executing the command, and sed will see an empty file. You must redirect the output to a different file and then move or copy the new file over the old one, e.g.:
Pasted from <http://unix.ittoolbox.com/groups/technical-functional/shellscript-l/trying-to-change-a-file-with-sed-2242269>
星期三, 八月 12, 2009
GROUP BY, HAVING, SUM, AVG, and COUNT(*) 转载
Welcome to the Database Programmer!
Good programming skills do not lead magically to good database skills. Masterful use of the database requires knowledge of the database in its own terms. Step 1 is knowing your table design patterns, and Step 2 is knowing how to fashion efficient queries. Learning how to code good queries can lead to faster performance and better application code.
There is a new entry in this series every Monday morning, and the Complete Table of Contents is here.
Aggregation
You can use a SQL SELECT to aggregate data. Aggregation combines rows together and performs some operation on their combined values. Very common aggregations are COUNT, SUM, and AVG.
The simplest use of aggregations is to examine an entire table and pull out only the aggregations, with no other columns specified. Consider this SQL:
SELECT COUNT(*) as cnt ,SUM(sale_amount) as sum ,AVG(sale_amount) as avg FROM orders
If you have a very small sales order table, say about 7 rows, like this:
ORDER | DATE | STATE | SALE_AMOUNT ------+------------+-------+------------- 1234 | 2007-11-01 | NY | 10.00 1235 | 2007-12-01 | TX | 15.00 1236 | 2008-01-01 | CA | 20.00 1237 | 2008-02-01 | TX | 25.00 1238 | 2008-03-01 | CA | 30.00 1237 | 2008-04-01 | NY | 35.00 1238 | 2008-05-01 | NY | 40.00
Then the simple query above produces a one-row output:
CNT | SUM | AVG -----+------+----- 7 | 175 | 25
Some Notes on The Syntax
When we use COUNT(*) we always put the asterisk inside.
I have used the "as SUM" to specify a column name of the output. Without that I will get whatever the database server decides to call it, which will vary from platform to platform, so it is a good idea to learn to use the "AS" clause. Some folks would frown at using "SUM" as the name, since that is the name of the function and might be confusing, but I think we're all big kids and we can probably handle it.
The WHERE Clause Does What You Think
If you want to get just the sales from New York state, you can put a WHERE clause in:
SELECT COUNT(*) as cnt ,SUM(sale_amount) as sum ,AVG(sale_amount) as avg FROM orders WHERE state = 'NY'
...and you will get only the results for NY:
CNT | SUM | AVG ----+------+---------- 3 | 85 | 28.33333
Notice of course that the average has a repeating decimal. Most databases have a ROUND function of some sort, so I can correct that with:
SELECT COUNT(*) as cnt ,SUM(sale_amount) as sum ,ROUND(AVG(sale_amount),0) as avg FROM orders WHERE state = 'NY'
The Fun Begins With GROUP BY
The query above is fine, but it would be very laborious if you had to issue the query (or write a program to do it) for every possible state. The answer is the GROUP BY clause. The GROUP BY clause says that the aggregations should be performed for the distinct values of a column or columns. It looks like this:
SELECT state, ,COUNT(*) as cnt ,SUM(sale_amount) as sum ,ROUND(AVG(sale_amount),0) as avg FROM orders GROUP BY state
Which gives us this result:
STATE | CNT | SUM | AVG ------+-----+------+---- NY | 3 | 85 | 28 TX | 2 | 40 | 20 CA | 2 | 50 | 25
Note that if you try to include a column that you are not grouping on, such as zip code, most database servers will reject the query because there may be different values of zip code for the same value of state, and they have no way to know which one to pick for a given value of state.
HAVING Clause is Like WHERE after GROUP BY
The HAVING clause lets us put a filter on the results after the aggregation has taken place. If your Sales Manager wants to know which states have an average sale amount of $25.00 or more. Now our query looks like this:
SELECT state, ,COUNT(*) as cnt ,SUM(sale_amount) as sum ,ROUND(AVG(sale_amount),0) as avg FROM orders GROUP BY state HAVING AVG(sale_amount) >= 25
Which gives us this result, notice that Texas is now missing, as they were just not selling big enough orders (sorry 'bout that Rhonda).
STATE | CNT | SUM | AVG ------+-----+------+---- NY | 3 | 85 | 28 CA | 2 | 50 | 25
The Hat Trick: All Three
You can pull some pretty nice results out of a database in a single query if you know how to combine the WHERE, GROUP BY, and HAVING. If you have ever worked with a Sales Manager, you know they constantly want to know strange numbers, so let's say our Sales Manager says, "Can you tell me the average order size by state for all orders greater than 20? And don't bother with any average less 30.00" We say, "Sure, don't walk away, I'll print it out right now."
SELECT state ,COUNT(*) ,SUM(sale_amount) as sum ,ROUND(AVG(sale_amount) as avg FROM orders WHERE sale_amount > 20 GROUP BY state HAVING avg(sale_amount) >= 30
How to Do a Weighted Average
Consider the case of a table that lists test, homework and quiz scores for the students in a certain course. Each particular score is worth a certain percentage of a student's grade, and the teacher wants the computer to calculate each student's file score. If the table looks like:
STUDENT | WEIGHT | SCORE ------------+--------+------- NIRGALAI | 40 | 90 NIRGALAI | 35 | 95 NIRGALAI | 25 | 85 JBOONE | 40 | 80 JBOONE | 35 | 95 JBOONE | 25 | 70 PCLAYBORNE | 40 | 70 PCLAYBORNE | 35 | 80 PCLAYBORNE | 25 | 90
Then we can accomplish this in one pull like so:
SELECT student ,SUM(weight * score) / 100 as final FROM scores GROUP BY student
The nice thing about this query is that it works even if data is missing. If a student missed a test, they automatically get a zero averaged in.
Conclusion: Queries Are Where It's At
The only reason to put data into a database is to take it out again. The modern database has powerful strategies for ensuring the correctness of data going in (the primary key, foreign key and other constraints) and equally powerful tools for pulling the data back out.
星期三, 八月 05, 2009
nohup命令详解 zz
http://www.21andy.com/blog/20071121/677.html
使用nohup让程序永远后台运行
Unix/Linux下一般比如想让某个程序在后台运行,很多都是使用 & 在程序结尾来让程序自动运行。比如我们要运行mysql在后台:
但是加入我们很多程序并不象mysqld一样做成守护进程,可能我们的程序只是普通程序而已,一般这种程序使用 & 结尾,但是如果终端关闭,那么程序也会被关闭。但是为了能够后台运行,那么我们就可以使用nohup这个命令,比如我们有个test.php需要在后台运行,并且希望在后台能够定期运行,那么就使用nohup:
提示:
[~]$ appending output to nohup.out
嗯,证明运行成功,同时把程序运行的输出信息放到当前目录的 nohup.out 文件中去。
附:nohup命令参考
nohup 命令
用途:不挂断地运行命令。
语法:nohup Command [ Arg ... ] [ & ]
描述:nohup 命令运行由 Command 参数和任何相关的 Arg 参数指定的命令,忽略所有挂断(SIGHUP)信号。在注销后使用 nohup 命令运行后台中的程序。要运行后台中的 nohup 命令,添加 & ( 表示“and”的符号)到命令的尾部。
无论是否将 nohup 命令的输出重定向到终端,输出都将附加到当前目录的 nohup.out 文件中。如果当前目录的 nohup.out 文件不可写,输出重定向到 $HOME/nohup.out 文件中。如果没有文件能创建或打开以用于追加,那么 Command 参数指定的命令不可调用。如果标准错误是一个终端,那么把指定的命令写给标准错误的所有输出作为标准输出重定向到相同的文件描述符。
退出状态:该命令返回下列出口值:
126 可以查找但不能调用 Command 参数指定的命令。
127 nohup 命令发生错误或不能查找由 Command 参数指定的命令。
否则,nohup 命令的退出状态是 Command 参数指定命令的退出状态。
nohup命令及其输出文件
nohup命令:如果你正在运行一个进程,而且你觉得在退出帐户时该进程还不会结束,那么可以使用nohup命令。该命令可以在你退出帐户/关闭终端之后继续运行相应的进程。nohup就是不挂起的意思( n ohang up)。
该命令的一般形式为:nohup command &
使用nohup命令提交作业
如果使用nohup命令提交作业,那么在缺省情况下该作业的所有输出都被重定向到一个名为nohup.out的文件中,除非另外指定了输出文件:
在上面的例子中,输出被重定向到myout.file文件中。
使用 jobs 查看任务。
使用 fg %n 关闭。
另外有两个常用的ftp工具ncftpget和ncftpput,可以实现后台的ftp上传和下载,这样我就可以利用这些命令在后台上传和下载文件了。
星期四, 七月 16, 2009
sqlldr 关键词详解
节录重要部分 详情请见:
http://camden-www.rutgers.edu/help/Documentation/Oracle/server.815/a67792/ch06.htm
sqlldr ... Valid Keywords:
userid -- Oracle username/password control -- Control file name log -- Log file name bad -- Bad file name data -- Data file name discard -- Discard file name discardmax -- Number of discards to allow (Default all) skip -- Number of logical records to skip (Default 0) load -- Number of logical records to load (Default all) errors -- Number of errors to allow (Default 50) rows -- Number of rows in conventional path bind array or between direct path data saves (Default: Conventional Path 64, Direct path all) bindsize -- Size of conventional path bind array in bytes (System-dependent default) silent -- Suppress messages during run (header, feedback, errors, discards, partitions, all) direct -- Use direct path (Default FALSE) parfile -- Parameter file: name of file that contains parameter specifications parallel -- Perform parallel load (Default FALSE) readsize -- Size (in bytes) of the read buffer file -- File to allocate extents from
Using Command-Line Keywords
Keywords are optionally separated by commas. They are entered in any order. Keywords are followed by valid arguments.
For example:
SQLLDR CONTROL=foo.ctl, LOG=bar.log, BAD=baz.bad, DATA=etc.dat USERID=scott/tiger, ERRORS=999, LOAD=2000, DISCARD=toss.dis, DISCARDMAX=5