North M - Beer: 十一月 2011

2011年11月17日星期四

二进制日志中语句的顺序

今天看到文档上说二进制日志中语句的顺序会按照commit的顺序出现, 于是我猜想这样如果设置了slave的话, 因为slave都是单线程执行的, 那么执行的顺序是否也会按照commit的顺序执行.

由此, 一个auto increment的表, 如果首先插入的后commit, 会不会到slave端后执行, 从而最终产生的auto increment值会与master不一样.

结果证明, 不会!!

slave段的数据与master端的数据是一致的.

呵呵, 看来是我自己太....了.

2011年11月11日星期五

sql server2008丢失管理权限

今天又台sql server2008的server不小心将login中的本机administrator用户删掉了, 而sa密码又不记得了, 结果登录以后没有权限做任何修改.

按照网上一篇文章的办法成功找回:

1. server上建一个administrators组的用户

2. 用这个用户登录操作系统, 停掉mssqlserver服务, 并以单用户模式启动mssqlserver

3. 以windows认证方式登录sql server, 将administrator加回到login

4. 以adminisstrator重新登录系统, 并重新启动sql server服务.

2011年11月10日星期四

今天看了mysql官方文档的使用内存一项, 觉得比较有用, 遂翻译了下.
原文地址:http://dev.mysql.com/doc/refman/5.1/en/memory-use.html

1. 所有进程共享myisam key buffer, 该buffer由key_buffer_size决定
2. 进程自己的内存有
一个栈, thread_stack
一个连接缓存, net_buffer_length
一个结果缓存, net_buffer_length
这两个buffer最初大小为net_buffer_length指定的大小, 可以动态扩展到max_allowed_packet,
语句结束以后, 回缩回到net_buffer_length的大小
所有进程共享基础内存
当进程不再使用, 内存即释放, 除非其进入thread cache
5.1.4之前, 只有压缩的myisam表才可以使用memeory map, 但是5.1.4以后, 可以将myisam_use_map设为1, 使所有的myisam表均可以使用memory map'
每一个对表进行顺序扫描的进程都会分配一个read buffer, read_buffer_size
当以字面顺序来读取表的内容的时候, 可能会分配一个random-read buffer, 有read_rnd_buffer_size决定
所有的join操作都在一个单独的通道内完成, 大部分的排序甚至不需要用到临时表就可以完成, 大部分的临时表都是存储在内存中的hash表, 当行长度过大或者是包含blob类型的数据会存储在磁盘上.
当一个内存临时表过大, 就会转化为一个磁盘表, 并且也会转换为myisam, 可以修改允许的临时表大小
需要排序的请求会分配一个sort buffer, 并且产生0-2个临时文件, 这点取决于结果集大小
几乎所有的分析以及计算都会在一个可重用的进程内的内存池中完成,
当一个myisam表被打开的时候, 其索引只会被打开一次, 但是每个并行访问的进程都会分别打开数据文件. 对每个并行进程, 会分配表结构区, 列结构区, 一个一个3*n大小的buffer(n为最大行长度, 不包含blob列) 每个blob column会需要blob数据的长度加上5到8个字节的内存大小. myisam引擎还会需要另外一个buffer来作为内部使用.
对于每一个含有blob列的表, 会有一个额外的buffer来存储blob数据, 并且能够动态扩展
所有当前在使用的表的结构, 都会存储在缓存中, 并且使用FIFO来管理, 初始大小为table_open_cache, 如果两个表被同时打开, 那么在该缓存中会存在该表的两个入口.
使用flush tables语句或者mysqladmin flushtables命令会关闭所有当前未被使用的表, 而当前在使用的表也会被标记为使用结束后立即关闭, 这样可以有效的释放绝大多数当前在使用的内存. 知道所有的表都被关闭, 该命令才会返回
Grant, Create User, Create Server, Install Plugin的结构都会缓存在内存中, 对应的revoke, drop user, drop server, uninstall plugin语句也不会释放这些内存, 所以如果前面这些语句执行过多, 对应的内存占用也会增大.

2011年11月9日星期三

再谈read_rnd_buffer_size参数

<MySQL性能调优与架构设计>一书中在讲到read_rnd_buffer_size的时候, 说它是进行random scan的时候使用到的buffer, 但是我记得前面我读到过一篇资料, 说read_rnd_buffer_siza仅仅是在order by的时候能够起到有限的作用.

于是翻了一下前面的文章, 并且也参阅了mysql官方文档中<How MySQL Uses Memory>一部分, 发现那片文章中讲read_buffer_size是只是for MYISAM用的其实也是不正确的, 因为官方文档中说了, 这两个参数都是适用与所有存储引擎的.(不过那篇文章是07年写的, 那个时候也许还真是只for MYISAM)

说道的两篇文章参见:

http://www.mysqlperformanceblog.com/2007/07/24/what-exactly-is-read_rnd_buffer_size/

http://ourmysql.com/archives/232

其实read_rnd_buffer_size的作用在这两篇文张里面已经说得明白了, 我这里补充记录一下:

在第一种排序算法中, 因为放在sort buffer中的只有参与排序的列以及row point, 因为排序以后, row point是乱的, 是随机的, 所以直接按照排序的顺序去读取需要的数据会比较慢, 因为是随机读取. 而read_rnd_buffer_size的作用就在于将排序以后的列以及row point放到read_rnd_buffer中, 按照row point再进行一下排序, 然后按照这个顺序去读取数据, 运气好的话, 排序以后的row point对应的数据也是连续的(比如说按照类似于creation date这样的列进行排序), 这样读起来就会比较快了.

我觉得, 这样按照row point读取出来的数据有不是query需要的排序了, 所以还要按照原来sort buffer中的顺序再组织一下, 这一步应该也会带来一些cost.

key cache预加载

来自sky.jian<MySQL性能调优与架构设计>

mysql@sky:~$ cat /usr/local/mysql/etc/init.sql

SET GLOBAL hot_cache.key_buffer_size=16777216

SET GLOBAL cold_cache.key_buffer_size=16777216

CACHE INDEX example.top_message in hot_cache

CACHE INDEX example.event in cold_cache

LOAD INDEX INTO CACHE example.top_message,example.event IGNORE LEAVES

LOAD INDEX INTO CACHE example.user IGNORE LEAVES,exmple.groups

我发现重设key buffer size的时候, 会将key buffer中的数据全部清空.

MySQL的多key cache

我知道oracle中有keep buffer, recycle buffer, 接触MySQL后不知道mysql中有没有这样的机制, 今天知道mysql的key buffer cache也是有这种机制的.
首先设置一个另外的key cache:
SET GLOBAL hot_cache.eycache1.key_buffer_size=128*1024;
然后设置某些表使用这个key cache:
CACHE INDEX t1, t2, t3 IN hot_cache;
如果我们要去掉这个key cache, 我们只需要将该key cache大小设为0就可以了:
SET GLOBAL hot_cache.eycache1.key_buffer_size=0;
如果将某个key cache大小设为0, 原来归属于这个key cache的表就都归属到默认的key cache了
注意, 不能把默认的key cache大小设为0.

我们可以将这些放在系统启动脚本中完成, 这样每次重启就不会丢失原来的设置了:
key_buffer_size = 4G
hot_cache.key_buffer_size = 2G
cold_cache.key_buffer_size = 2G
init_file=/path/to/data-directory/mysqld_init.sql
在mysqld_init.sql中我们可以写:
CACHE INDEX db1.t1, db1.t2, db2.t3 IN hot_cache
CACHE INDEX db1.t4, db2.t5, db2.t6 IN cold_cache

关于key buffer usage ratio的算法

今天看简朝阳的<<MySQL性能优化与架构设计>>, 降到key buffer的时候, 书中有一个关于key buffer usage ratio的算法, 感觉应该是错的:

Key_buffer_UsageRatio = (1 - Key_blocks_used/(Key_blocks_used + Key_blocks_unused)) * 100%

这个应该是反的, 只需要Key_blocks_used/(Key_blocks_used + Key_blocks_unused) * 100% 就可以了.

2011年11月8日星期二

MySQL的Nested Loop算法

原文地址:http://dev.mysql.com/doc/refman/5.0/en/nested-loop-joins.html
普通Nested-Loop算法:
有以下join:
Table   Join Type
t1      range
t2      ref
t3      ALL
实际join示意:
for each row in t1 matching range {
for each row in t2 matching reference key {
   for each row in t3 {
     if row satisfies join conditions,
     send to client
   }
}
}
因为普通Nested-Loop一次只将一行传入内存循环, 所以外层循环有多少行, 内存循环便要执行多少次.
Block Nested-Loop Join:
这种算法指的是使用join buffer的Nested Loop, 将外层循环的行读进buffer, 从而减少内层循环的次数. 例如, 如果读入了10行进入join buffer, 那么内存循环就可以一次与这10行进行比较, 这样就能够显著减少内层循环表扫描的次数.
MySQL使用Join Buffer有以下要点:
1. join_buffer_size变量决定buffer大小
2. 只有在join类型为all, index, range的时候才可以使用join buffer
3. 能够被buffer的每一个join都会分配一个buffer, 也就是说一个query最终可能会使用多个join buffer
4. 第一个nonconst table不会分配join buffer, 即便其扫描类型是all或者index
5. 在join之前就会分配join buffer, 在query执行完毕即释放
6. join buffer中只会保存参与join的列, 并非整个数据行

前面描述的query, 如果使用join buffer, 那么实际join示意如下:

for each row in t1 matching range {
for each row in t2 matching reference key {
   store used columns from t1, t2 in join buffer
   if buffer is full {
     for each row in t3 {
       for each t1, t2 combination in join buffer {
         if row satisfies join conditions,
         send to client
       }
     }
     empty buffer
   }
}
}

if buffer is not empty {
for each row in t3 {
   for each t1, t2 combination in join buffer {
     if row satisfies join conditions,
     send to client
   }
}
}

如果t1, t2参与join的列长度只和为s, c为二者组合数, 那么t3表被scan的次数为
(S * C)/join_buffer_size + 1
随着join_buffer_size的增大, t3将减少, 知道join buffer能够容纳所有的t1, t2组合, 到了这点以后, 再增大join buffer size, query的速度就不会再变快了.

2011年11月7日星期一

join_buffer_size是怎样分配的

原文地址:http://www.mysqlperformanceblog.com/2010/07/05/how-is-join_buffer_size-allocated/

Join Buffer不像很多其他buffer一样是每一个线程(thread)分配的, 而是每一个线程的每一次join分配的(per-join-per-thread).
当Join方式为All, Index, Range时, 参与join的table 列的所有值便会被cache到分配的join buffer中. 如果query需要join多个表, 那么便会分配多个join buffer:
例如以下的join便会分配两个join buffer:

select * from a_table
join b_table on b.col1 = a.col1
join c_table on c.col2 = b.col2

我们还是回头来看join buffer是如何分配的, 源代码:

14176 /*****************************************************************************
14177   Fill join cache with packed records
14178   Records are stored in tab->cache.buffer and last record in
14179   last record is stored with pointers to blobs to support very big
14180   records
14181 ******************************************************************************/
14182
14183 static int
14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)
14185 {
... snip ...
14268   cache->length=length+blobs*sizeof(char*);
14269   cache->blobs=blobs;
14270   *blob_ptr=0;             /* End sequentel */
14271   size=max(thd->variables.join_buff_size, cache->length);
14272   if (!(cache->buff=(uchar*) my_malloc(size,MYF(0))))
14273     DBUG_RETURN(1);           /* Don't use cache */ /* purecov: inspected */
14274   cache->end=cache->buff+size;
14275   reset_cache_write(cache);
14276   DBUG_RETURN(0);
14277 }

从14721行我们可以看到, cache size 为 join_buffer_size与实际需要的大小中的较大的那个数, 并且是一次分配的, 也就是说, 如果你将join_buffer_size设的相当大, 害处是相当大的, 因为很容易就可以将内存耗尽.

Problem when monitor slave using MySQL Enterprise Monitor

Finally I successfully setup my Mysql Enterprise monitor to monitor the replication server of our production database.

It failed so many times. Every time I monitor the production database and slave database at the same time, the monitor server always identified them as the same server, and combine them into just one server, which puzzle me for a long time.

I have tried to use one single agent instances, multiple agent instances, and multiple agent installs. I thought the key problem might lie on the UUID of the server. Because no matter how I change the UUID in the agent configuration files, the UUID value I saw in the monitor page just never changes.

Finally I found an article said that every database stores a UUID value in mysql.inventory to identify it self.

So I checked it, sure enough I found same value in these two database. Sure they are!! I created the slave using the backup of the master server!!

I changed the slave's value, and it works!!

But why this not happened when I monitor another group database, it also includes one master and one slave. It's strange that they have different UUID value in mysql.inventory.

Maybe their version is higher!!

修改cacti脚本, 增加磁盘使用百分比

最近在部署cacti用来monitor各数据库服务器, 想要设置一个磁盘空间报警, 当分区满90的时候发alert mail, 但是看了现在已经设置的monitor disk space的data source, 里面并没有类似于used_percent项, 又懒得再去找其他的template, 变自己修改已经有的脚本来实现.

首先找到对应的data query:

我们知道了这个data query对应的xml文件, 于是找到这个xml文件, 编辑, 在field中增加一项:

<fields>
               <hrStorageIndex>
                       <name>Index</name>
                       <direction>input</direction>
                       <query_name>index</query_name>
               </hrStorageIndex>
......
               <hrStorageUsedPercent>
                       <name>Used Percent</name>
                       <direction>output</direction>
                       <query_name>used_percent</query_name>
               </hrStorageUsedPercent>

从这个文件中我们也可以看出, 所用的php文件:

<script_path>|path_cacti|/scripts/ss_host_disk.php</script_path>

对应编辑这个php文件(不会php, 只能看着改了):

if ($cmd == "index") {
$return_arr = ss_host_disk_reindex(cacti_snmp_walk($hostname, $snmp_community, $oids["index"], $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol, $snmp_priv_passphrase, $snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, $max_oids, SNMP_POLLER));

for ($i=0;($i<sizeof($return_arr));$i++) {
print $return_arr[$i] . "\n";
}

}elseif ($cmd == "num_indexes") {
$return_arr = ss_host_disk_reindex(cacti_snmp_walk($hostname, $snmp_community, $oids["index"], $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol, $snmp_priv_passphrase, $snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, $max_oids, SNMP_POLLER));

print sizeof($return_arr);

}elseif ($cmd == "query") {
$arg = $arg1;

$arr_index = ss_host_disk_reindex(cacti_snmp_walk($hostname, $snmp_community, $oids["index"], $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol, $snmp_priv_passphrase, $snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, $max_oids, SNMP_POLLER));
$arr = ss_host_disk_reindex(cacti_snmp_walk($hostname, $snmp_community, $oids[$arg], $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol, $snmp_priv_passphrase, $snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, $max_oids, SNMP_POLLER));

for ($i=0;($i<sizeof($arr_index));$i++) {
print $arr_index[$i] . "!" . $arr[$i] . "\n";
}
}elseif ($cmd == "get") {
$arg = $arg1;
$index = $arg2;

if ($arg == "used_percent")
{
$sau_used = preg_replace("/[^0-9]/i", "", db_fetch_cell("select field_value from host_snmp_cache where host_id=$host_id and field_name='hrStorageAllocationUnits' and snmp_index='$index'"));
$snmp_used = cacti_snmp_get($hostname, $snmp_community, $oids["used"] . ".$index", $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol,$snmp_priv_passphrase,$snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, SNMP_POLLER);
if($snmp_data_used<0){
$used = (abs($snmp_used) + 2147483647) * $sau_used;
} else {
$used = $snmp_used * $sau_used;
}

$sau_total = preg_replace("/[^0-9]/i", "", db_fetch_cell("select field_value from host_snmp_cache where host_id=$host_id and field_name='hrStorageAllocationUnits' and snmp_index='$index'"));
$snmp_total = cacti_snmp_get($hostname, $snmp_community, $oids["total"] . ".$index", $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol,$snmp_priv_passphrase,$snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, SNMP_POLLER);
if($snmp_total<0){
$total = (abs($snmp_total) + 2147483647) * $sau_total;
} else {
$total = $snmp_total * $sau_total;
}

return number_format($used * 100 / $total, 2);
}

if (($arg == "total") || ($arg == "used")) {
$sau = preg_replace("/[^0-9]/i", "", db_fetch_cell("select field_value from host_snmp_cache where host_id=$host_id and field_name='hrStorageAllocationUnits' and snmp_index='$index'"));
$snmp_data = cacti_snmp_get($hostname, $snmp_community, $oids[$arg] . ".$index", $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol,$snmp_priv_passphrase,$snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, SNMP_POLLER);
if($snmp_data<0){
return (abs($snmp_data) + 2147483647) * $sau;
} else {
return $snmp_data * $sau;
}
}else{
return cacti_snmp_get($hostname, $snmp_community, $oids[$arg] . ".$index", $snmp_version, $snmp_auth_username, $snmp_auth_password, $snmp_auth_protocol,$snmp_priv_passphrase,$snmp_priv_protocol, $snmp_context, $snmp_port, $snmp_timeout, $ping_retries, SNMP_POLLER);
}
}

接下来就可以到cacti中操作了:
在data template中增加一个:

在data query中关联graph template以及 date template的地方:

然后修改graph template(其实应该没有必要这么加, 因为这个百分比相对于磁盘大小来说实在太小了, 在同一张图里面根本显示不出来, 只是我现在没研究透, 就全加了):

然后我们就可以添加threshold了.

2011年11月2日星期三

Setup a replication of MYISAM database

We have a database running on "192.168.1.106", and a DR site "10.1.0.4". Now we want to setup a slave database on "10.1.0.4".

Most tables on "192.168.1.106" are MYISAM.

I think the most difficult part is to get a consistent copy of the whole database, because most of the tables on "192.168.1.106" are MYISAM tables. The general approach is to shutdown it or lock the whole database for a long time.

I have monitored the database server for days, try to get a accurate time when the loading is very low, so I can lock the database and minimize the side effect. Unfortunately I didn't get it.

The day before tomorrow I found that the "cp" command had a "update" option, which can copy only the updated files.

So I copied the whole database when the database is running, then locked it in read mode, got the bin-log position(for setup slave), and used the "update" option to copy the incremental part. The lock process just lasted for seconds.

Then I transferred the copy to "10.1.0.4", checked the configure file to make sure the database can work as a slave.

Because the database version on "10.1.0.4" is 5.1.53, which is higher than "192.168.1.106", so I run the "mysql_upgrade" command after I started the server.

Then I setup the slave, using the replication user I setup on "192.168.1.106" and the bin-log position I got when locked the database.

I found some tables were corrupt after I started the slave process, and I don't know why.

I can just re-dump then from "192.168.1.106" to import then into "10.1.0.4". It's too dangers to do it like this, because this would lead duplicate rows in those tables after restarted the slave process.

As expected, I saw "duplicate rows" error when I used "show slave status".

So I deleted the rows generated after I did the last "update" optioned "cp" command, and restarted the slave. It works!!!

2011年11月1日星期二

abs的tuning

上周收到一个报表sql语句, 跑的很慢, 几个小时还出不来, 锁死很多production的session.

原来业务上会cancel掉一个order或者order中的某个item, 而由于老系统设计的关系, cancel的时候不是改状态, 而是将对应的id设为负值.

而现在的报表需要找到曾经出现过某个状态的item列表, 而这些item现在可能是cancel掉也可能是正常的.

主要串了3张表, 三张表上order_id, item_id都有索引.

但是因为正负值的关系, 串表的时候都使用了abs, 这就到这3个表上的索引在则个sql中全部无效.

tuning方法是将abs串表改为正常的正负值串表, 最终用union, 将结果集合并, 并处理好串表时候的条件, 保证至少有一个join部分能够走到索引.

这样最终结果虽然效率也不是很高, 因为每段sql语句都还是会有一部分要走全表扫描, 但只要控制好表连接顺序将走全表扫描的尽量放到最后, 效果已经比原来的好多了. 原来几个小时也跑不出来的, tuning以后十几秒就出来了.

订阅：评论 (Atom)

North M - Beer