[Dockdev] Running Dock on paralel

Ricardo Nicoluci nicoluci at gmail.com
Thu Sep 15 07:09:16 PDT 2005


Hi,

I just installed Dock to run on a cluster. But instead of using the usual 
MPI configuration, I took the advantage of our cluster has an OpenMosix 
solution and instead of install the MPI on all cluster nodes or share the 
MPI installation folder trough NFS and also the data among them, with the 
openmosix solution I just start all the eight process into the node1 a let 
the it manage to me. I like it pretty much because it has an auto-balacing 
control, so it tends to use all the machines equaly, even considering 
machines that are better the others. However, sometimes that I run
Dock 5.2on paralel it gives the following error:

p3_1766: p4_error: net_recv read: probable EOF on socket: 1
p5_1808: p4_error: net_recv read: probable EOF on socket: 1
p8_1894: p4_error: net_recv read: probable EOF on socket: 1
bm_list_1718: (9296.991802) wakeup_slave: unable to interrupt slave 0 pid 
1717
bm_list_1718: (9296.992147) wakeup_slave: unable to interrupt slave 0 pid 
1717
rm_l_1_1735: (9294.355503) net_send: could not write to fd=6, errno = 9
rm_l_1_1735: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
rm_l_2_1753: (9293.837317) net_send: could not write to fd=6, errno = 9
rm_l_2_1753: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
rm_l_4_1794: (9290.788141) net_send: could not write to fd=6, errno = 9
rm_l_4_1794: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
rm_l_6_1838: (9285.729331) net_send: could not write to fd=6, errno = 9
rm_l_6_1838: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
rm_l_7_1861: (9283.199900) net_send: could not write to fd=6, errno = 9
rm_l_7_1861: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
could not get processgroup for 2069

It seems to be a problem between a slave process conecting to the master 
process of the MPI paralelization. I know the the OpenMosix maybe causing 
this error, but googling it I found that this error may occur even on pure 
MPI system. Did someone ever get it ? Any information from the dev people ?

Thanks,


-- 
Ricardo de Paula Nicoluci
PhD Student
Medicinal Chemistry Laboratory
IFSC - University of Sao Paulo
Sao Carlos - SP - Brazil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://blur.compbio.ucsf.edu/pipermail/dockdev/attachments/20050915/b4800578/attachment.html


More information about the Dockdev mailing list