Friday, September 26, 2014

Fastest bit counting

The best bit counting algorithm as far as I know is the one invented by folks at Stanford University, which is always O(1).

int bitcount(int n)
{
    int cnt = 0;

    n = n - ((n >> 1) & 0x55555555);
    n = (n & 0x33333333) + ((n >> 2) & 0x33333333);

    return (((n + (n >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

For example, after I compile it to x86 Assembly:

    .cfi_startproc
    movl    4(%esp), %eax
    movl    %eax, %edx
    sarl    %edx
    andl    $1431655765, %edx
    subl    %edx, %eax
    movl    %eax, %edx
    sarl    $2, %eax
    andl    $858993459, %edx
    andl    $858993459, %eax
    addl    %edx, %eax
    movl    %eax, %edx
    sarl    $4, %edx
    addl    %edx, %eax
    andl    $252645135, %eax
    imull   $16843009, %eax, %eax
    sarl    $24, %eax
    ret
    .cfi_endproc


But, how does actually the algorithm work?

Ok, let try a simple one.  Assume n is a 4-bit number, and the bits can be represented as a set such that n= {a,b,c,d}., where a,b,c.. can only have either 0 or 1.  The decimal value of the n is: 8*a + 4*b + 2*c + d.  Total number of bit one is: a + b + c +d.  

For example, for n=0b1101, a=1, b=1, c=0, d=1 (which in decimal is 8*1 + 4*1 + 2*0 + 1 = 13), and total number of bit one is a+b+c+d = 1 + 1 + 0 + 1 = 3.  So far so good?

Now, we know that to count the 1-bits is as simple as: a+b+c+d.  But, wait.... n itself is not a+b+c+d, but 8*a + 4*b + 2*c + d.  Ok, let's conquer this step-by-step.  

If we shift n one bit to the right the LSbit is gone and other numbers just divided by two, so n becomes 4*a + 2*b + c, right? Now substract this to the original n.
  
n      = 8*a + 4*b + 2*c + d
n>>1   = 0 +   4*a + 2*b + c
----------------------------- -
       = 0   + 4*a + 2*b + c + d

That's a good direction!  Now if  (n>>1)  is written in the 4-bit nibble it is actually 0 + 4a + 2b + c.  If I just want to subtract 4a and c, we have to mask out 2*b.  so we mask it with binary 0101 (0x5), so we get only 4a + c:

n              = 8*a + 4*b + 2*c + d
(n>>1)&0x5     = 4*a +   0 + c
---------------------------------- -
n -(n>>1)&0x05 = 4*a + 4*b + c + d

Now store this result back to n, so from now on n is 4*a + 4*b + c + d
To get c + d only, we mask n with 0b11 or n & 0x03
if we shift n above once, we get 0 + 2a + 2b + 0, but if shift it again it becomes a + b!
To make sure we only get the lowest two bits (a + b), we mask it to 0x03 or:

n & 0x03       = c + d
(n >> 2)&0x03) = a + b
------------------------ +

Nice! now we have this expression: a + b + c +d .

so, for 4-bit bit counting, we can use the expression:

n = n - (n>>1) & 0x05
bits = (n & 0x3) + (n>>2) & 0x3

Proof: as example above, n=13 (0b1101).  so a=1,b=1, c=0, d=1

n = 0b1101 - (0b0110) & 0b0101 = 0b1101 - 0b100 = 13 - 4 = 9 = 0b1001
then the next step:
bits =(0b1001 & 0b0011)  +  (0b1001>>2) & 0b0011, or bits = 0b0001 + (0b0010) & 0b0011
 or bits = 1 + 2 = 3 !!

For 32-bit, it is based on the same idea, except we have to do more.

Say n  has set of coefficients {a[0], a[1], ...., a[31]} to represent the number, so n = a[0]*2^31 + a[1]*2^30 + .... + a[15]*2^0

The mask should be 32-bit, so instead of 0x5, we use 0x55555555 = 0b0101010101...0101

n                   = 2^31*a[0] + ... + 2^1*a[30] + 2^0*a[31]
(n>>1)&0b0101..0101 = 2^30*a[0] + ... + 2^2*a[28] + 2^0*a[30]
-------------------------------------------------------------- -
n -(n>>1)&0x055555555 = 2^31*a[0] - 2^28*a[0] + (2^30*a[1] - 2^26*a[1]) + ... + 4*a[29] - 2*a[30] + a[31]


Let's review binary arithmetics.

23 - 22 = 8 - 4 = 4
216 - 215 = 65536 - 32768 = 32768
or: 2(a+1) - 2a = 2a

2(a+2) - 2a = 2a * 22 - 2a = 2a (4 - 1) = 3*2a

So the result is:

a[0] * (2^31 - 2^28) + a[1] * (2^30 - 2^26) + ..... + a[30] (4 -2) + a[31]
= 3*2^28*a[0] + .3*2^26*a[1].. + 2*a[30] + a[31]

n - n(>>1) & 0x055555555 = 3*2^28*a[0] + .3*2^26*a[1].. + 2*a[30] + a[31]

stored this as a new n.

n>>2 = 2^24*a[0] + 2^22*a[1] + ...2*a[28] + a[29]

The rest is actually manipulation to count this a[0] + a[1] + ...  + a[31]

A variant, but this one is invented by folks at MIT:

int bitcount(unsigned int n)
{
    int cnt = 0;
    register unsigned int tmp;
                     
    tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) &    011111111111);

    return ((tmp + (tmp >>3)) & 0030707070707) % 63;
}

It uses the similar method. but with different approach (notice the number 01..., 0333... and 00307... are in octal.  We could use Hex version but then it is harder to remember)





Monday, June 2, 2014

Audio on Laptop HP dv7-6c80us not working

Problem:

HP-PAVILION-DV7:~$ aplay -l
aplay: device_list:268: no soundcards found...
HP-PAVILION-DV7:~$ 

HP-PAVILION-DV7:~$ lsmod | grep snd
snd_hda_codec_hdmi     41154  0 
snd_hda_codec_idt      50341  1 
snd_hda_intel          52267  0 
snd_hda_codec         188738  3 snd_hda_codec_hdmi,snd_hda_codec_idt,snd_hda_intel
snd_hwdep              13602  1 snd_hda_codec
snd_pcm               102033  3 snd_hda_codec_hdmi,snd_hda_codec,snd_hda_intel
snd_page_alloc         18710  2 snd_pcm,snd_hda_intel
snd_seq_midi           13324  0 
snd_seq_midi_event     14899  1 snd_seq_midi
snd_rawmidi            30095  1 snd_seq_midi
snd_seq                61560  2 snd_seq_midi_event,snd_seq_midi
snd_seq_device         14497  3 snd_seq,snd_rawmidi,snd_seq_midi
snd_timer              29433  2 snd_pcm,snd_seq
snd                    69141  11 snd_hwdep,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_idt,snd_pcm,snd_seq,snd_rawmidi,snd_hda_codec,snd_hda_intel,snd_seq_device,snd_seq_midi
soundcore              12680  1 snd

HP-PAVILION-DV7:~$ cat /proc/asound/pcm 
00-00: 92HD81B1X5 Analog : 92HD81B1X5 Analog : playback 1 : capture 1

HP-PAVILION-DV7:~$  lspci -v
...
...
00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
Subsystem: Hewlett-Packard Company Device 165b
Flags: bus master, fast devsel, latency 0, IRQ 54
Memory at c6500000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [130] Root Complex Link
Kernel driver in use: snd_hda_intel



Thursday, February 6, 2014

The Magic of 37

If x is a multiplication of 3 (3,6,9,12,15,....)

37 * x = y1y2y3 = z

 y1 + y2 + y3 = x
 y1 = z div 100
 y2 = (z mod 100) div 10
 y3 = (z mod 10) div 1

Examples:
37*3 = 111
1+1+1 = 3

37*6 = 222
2+2+2 = 6

Interesting, isn't??

Not really so.   x is multiplication of 3, so x = 3*b (where b=1,2,3...)
37*x = 37 * (3*b) = 111*b = bbb

so bbb = y1y2y3 = yyy

or y = b

When we do y1 + y2 + y3, it is actually b + b + b.  For example, pick b=2 (so x=6), 111*b = 222, 2+2+2=6=x.



Tuesday, December 31, 2013

Value with a constant growth factor

Many times we hear in the news, either financial news or economy in general, about growths with constant factor.  For example: "The appreciated value of residential property in townville grows at 4% per year', or "The stock price grows constantly at 5% per year", or "the population of villageville decreases by constant factor of 5%".

What does it mean?

Well, it is actually simple.  The value increases by factor of 4%.  If the current value is A0, next year its value is A0 + 4%*A0.  Next 2 years the value is A1 + 4%*A1 = A0*(1+4%)^2 and so on.  From this, we can deduct a general formula for a growth (which is a form of geometric series):

A(t) = A(0) * (1+g/100)^t

Where A(0) is the value at the initial evaluation (t = 0)
t = unit time for the growth
g = percentage of growth (in %), so we need to divide it by 100 there
A(t) = the value at t

From this formula, we also can find "Doubling Time", or the time needed for a value to be double.

Tdouble = ln(2) / ln( 1+g/100)

For example:
Michael bought his house in 2002 for $315,000.  The average appreciation rate of properties in his area is 2%/year.  How long he has to keep his house to make the house value double (assuming the appreciation rate stays the same)?

Answer:

Tdouble = ln(2) / ln( 1 + 2/100) = 0.693/0.0198 = 35 years.


P.S:
For negative growth (decrease), use negative g.

More complex calculation is if the growth factor is not constant.


Monday, December 16, 2013

How to mount disk used by ReadyNAS

The following steps are useful if we want to salvage data stored in the drive in ReadyNAS.
I am not sure if the steps below are going to work on other ReadyNAS models, but it works on my ReadyNAS Duo (Sparc CPU).

Basically, what we need is a SATA-to-USB cable (can be bought on the Internet for couple of bucks).
NETGEAR ReadyNAS partitions the drive into 4 partitions.  In my case, it is detected as /dev/sdc:


[root@r3000 media]# fdisk -l /dev/sdc

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               2     4096001     2048000   83  Linux
/dev/sdc2         4096002     4608001      256000   82  Linux swap / Solaris
/dev/sdc3         4608002  1953092233   974242116    5  Extended
/dev/sdc5         4608003  1953092233   974242115+  8e  Linux LVM
[root@r3000 media]# 


There are couple of issues if we try to mount the partitions directly:

  1. ReadyNAS uses non-standard ext3 block-size, which in my case is 16384 bytes (use command "lvsc" to check)
  2. The home directory is partitioned as LV group, so conventional mount command is not gonna work
Here's the steps:
  • Scan the usb for LVM volumes and identify in the output the volume group name that has your READYNAS volume (mine proved to be c):
# vgscan
[root@r3000 media]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "c" using metadata type lvm2
  Found volume group "vg_r3000" using metadata type lvm2
  Found volume group "VolGroup" using metadata type lvm2

  • Activate the group for ReadyNAS (in this case, the group name is "C")
# vgchange -ay c
  • Find the logical volume that  (mine proved to be 'c'):
# lvs
  • root@r3000:/home/root# lvs
      LV   VG   Attr      LSize   Pool Origin Data%  Move Log Copy%  Convert
      c    c    -wi-ao--- 929.09g  
     
  • To display the logical name of the partition, use command "lvdisplay":
# lvdisplay /dev/c
  --- Logical volume ---
  LV Name                /dev/c/c
  VG Name                c
  LV UUID                7HUOrf-B5bL-ur6r-ULsd-yl4m-gCrA-zQc4s9
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                929.09 GiB
  Current LE             29731
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:5


We cannot use regular command "mount" to mount the non-standard blocksize ext3 partition.  Fortunately, there is a tool called "fuse-ext3" running in userspace that can help us.  The tool can be downloaded here.

Here's an example how to mount my ReadyNAS's LV volume:

fuse-ext2 -o sync_read,allow_other,rw+ /dev/c/c /media/readynas2

And here is the command to mount the system (root) partition which has the Linux software (I will post later about how to reset root password without resetting the ReadyNAS to factory default etc.)

fuse-ext2 -o sync_read,allow_other,rw+ /dev/c/c /media/readynas2


Note: don't forget to unmount the partitions once your'de done.



Why we need to migrate to IPv6 sooner



The test result above was done using DOCSIS 3.0 modem, XFinity Blast and thru 5 MHz Wi-Fi (802.11n).
See how IPv6 improves the download speed more than 3 times of IPv4 in download test?

I think of the reason is that with IPv6, we no longer need NAT on the router.  So the router most likely bypassed the traffic and directly forward it from the server to my computer (using global IP address).

The latency also improves with IPv6.  This might be caused due to the fixed size of IPv6 header, unlike IPv4.  All other optional headers in IPv6 are moved beyond the header, so routers can forward packets faster.