[Dockdev] Running Dock on paralel
Ricardo Nicoluci
nicoluci at gmail.com
Thu Sep 15 07:09:16 PDT 2005
Hi,
I just installed Dock to run on a cluster. But instead of using the usual
MPI configuration, I took the advantage of our cluster has an OpenMosix
solution and instead of install the MPI on all cluster nodes or share the
MPI installation folder trough NFS and also the data among them, with the
openmosix solution I just start all the eight process into the node1 a let
the it manage to me. I like it pretty much because it has an auto-balacing
control, so it tends to use all the machines equaly, even considering
machines that are better the others. However, sometimes that I run
Dock 5.2on paralel it gives the following error:
p3_1766: p4_error: net_recv read: probable EOF on socket: 1
p5_1808: p4_error: net_recv read: probable EOF on socket: 1
p8_1894: p4_error: net_recv read: probable EOF on socket: 1
bm_list_1718: (9296.991802) wakeup_slave: unable to interrupt slave 0 pid
1717
bm_list_1718: (9296.992147) wakeup_slave: unable to interrupt slave 0 pid
1717
rm_l_1_1735: (9294.355503) net_send: could not write to fd=6, errno = 9
rm_l_1_1735: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
rm_l_2_1753: (9293.837317) net_send: could not write to fd=6, errno = 9
rm_l_2_1753: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
rm_l_4_1794: (9290.788141) net_send: could not write to fd=6, errno = 9
rm_l_4_1794: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
rm_l_6_1838: (9285.729331) net_send: could not write to fd=6, errno = 9
rm_l_6_1838: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
rm_l_7_1861: (9283.199900) net_send: could not write to fd=6, errno = 9
rm_l_7_1861: p4_error: net_send write: -1
p4_error: latest msg from perror: Bad file descriptor
could not get processgroup for 2069
It seems to be a problem between a slave process conecting to the master
process of the MPI paralelization. I know the the OpenMosix maybe causing
this error, but googling it I found that this error may occur even on pure
MPI system. Did someone ever get it ? Any information from the dev people ?
Thanks,
--
Ricardo de Paula Nicoluci
PhD Student
Medicinal Chemistry Laboratory
IFSC - University of Sao Paulo
Sao Carlos - SP - Brazil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://blur.compbio.ucsf.edu/pipermail/dockdev/attachments/20050915/b4800578/attachment.html
More information about the Dockdev
mailing list