Error on sensor


Advanced search

Message boards : Number crunching : Error on sensor

Author Message
Profile Dingo
Avatar
Send message
Joined: 16 Jun 11
Posts: 48
Credit: 438,488
RAC: 85

Message 1581 - Posted: 13 Jan 2013 | 3:08:47 UTC
Last modified: 13 Jan 2013 | 3:14:00 UTC

I have been running the sensor on a Linux PC for a few days and it was running OK but the last few work units have this error repeated over and over.

I have removed the project and re attached and I have unplugged the sensor but nothing seems to fix this ??



claimed interface, No error,0
sensors.xml: 6 nodes found
Found sensor v2.01
9770,3,2013-1-13 3:3:52,513,r,381683167
14:04:02 (2725): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
19920,8,2013-1-13 3:4:3,513,n,381683167
14:04:12 (2793): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
31060,16,2013-1-13 3:4:14,513,n,381683167
14:04:24 (2832): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
42120,21,2013-1-13 3:4:25,513,n,381683167
14:04:35 (2840): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
52270,23,2013-1-13 3:4:35,513,n,381683167
14:04:45 (2849): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
62400,26,2013-1-13 3:4:45,513,n,381683167
14:04:55 (2854): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
72720,29,2013-1-13 3:4:55,513,n,381683167
14:05:05 (2867): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
82840,35,2013-1-13 3:5:6,513,n,381683167
14:05:15 (2959): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
93000,37,2013-1-13 3:5:16,513,n,381683167
14:05:26 (2972): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
104030,47,2013-1-13 3:5:27,513,n,381683167
14:05:37 (2977): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
114190,51,2013-1-13 3:5:37,513,n,381683167
14:05:47 (2990): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
125270,57,2013-1-13 3:5:48,513,n,381683167
14:05:58 (3047): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
135460,63,2013-1-13 3:5:58,513,n,381683167
14:06:08 (3052): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
146510,65,2013-1-13 3:6:9,513,n,381683167
14:06:19 (3137): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
156660,68,2013-1-13 3:6:20,513,n,381683167
14:06:29 (3142): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
167770,72,2013-1-13 3:6:31,513,n,381683167
14:06:41 (3148): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
177870,79,2013-1-13 3:6:41,513,n,381683167
14:06:51 (3152): No heartbeat from client for 30 sec - exiting
Radac $Rev: 560 $ starting...
claimed interface, could not get bound driver: No data available,0
sensors.xml: 6 nodes found
Found sensor v2.01
189020,83,2013-1-13 3:6:52,513,n,381683167

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Looked back in my Tasks and the error started when a new version was loaded.

1372607 1353320 5591 11 Jan 2013 | 16:20:34 UTC 11 Jan 2013 | 16:44:28 UTC Error while computing 10.06 0.01 --- Radioactivity Monitor v1.72 (nci)
1372552 1353265 5591 11 Jan 2013 | 15:56:13 UTC 11 Jan 2013 | 16:20:34 UTC Error while computing 10.07 0.01 --- Radioactivity Monitor v1.72 (nci)
1372461 1353174 5591 11 Jan 2013 | 15:31:46 UTC 11 Jan 2013 | 15:56:13 UTC Error while computing 10.06 0.01 --- Radioactivity Monitor v1.72 (nci)
1371135 1351848 5591 11 Jan 2013 | 8:35:18 UTC 11 Jan 2013 | 15:31:46 UTC Completed and validated 24,962.95 9.27 42.79 Radioactivity Monitor v1.69 (nci)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
____________

Proud Founder of
Have a look at my WebCam

Profile Dingo
Avatar
Send message
Joined: 16 Jun 11
Posts: 48
Credit: 438,488
RAC: 85

Message 1582 - Posted: 13 Jan 2013 | 3:53:41 UTC
Last modified: 13 Jan 2013 | 4:13:39 UTC

I had a copy of the old radac app and put that and sensors.xml in the project folder and I created a app_info.xml from this post http://radioactiveathome.org/boinc/forum_thread.php?id=46

I have moved the sensor back over to a Windows PC so I can get the credit for it running and you get some data. But I would still like it to run on my Linux PC as that runs 24/7

<app_info>
<app>
<name>radac</name>
<user_friendly_name>R@H beta test</user_friendly_name>
</app>
<file_info>
<name>radac</name>
<executable/>
</file_info>
<file_info>
<name>sensors.xml</name>
</file_info>
<app_version>
<app_name>radac</app_name>
<version_num>169</version_num>
<file_ref>
<file_name>radac</file_name>
<open_name>radac_1.69_i686-pc-linux-gnu_nci</open_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>sensors.xml</file_name>
<copy_file/>
</file_ref>
</app_version>
</app_info>


I stopped boinc and restarted but the I got the same error using the old radac_1.69

Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2715720,1259,2013-1-13 3:49:4,513,f,381683167
14:49:14 (7251): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2726770,1264,2013-1-13 3:49:15,513,n,381683167
14:49:25 (7257): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2737830,1271,2013-1-13 3:49:26,513,n,381683167
14:49:36 (7335): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2748870,1272,2013-1-13 3:49:37,513,n,381683167
14:49:47 (7338): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2759930,1280,2013-1-13 3:49:48,513,n,381683167
14:49:58 (7342): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2771000,1293,2013-1-13 3:50:0,513,n,381683167
14:50:09 (7347): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2782050,1297,2013-1-13 3:50:11,513,n,381683167
14:50:20 (7437): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2793100,1304,2013-1-13 3:50:22,513,n,381683167
14:50:32 (7452): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2804140,1312,2013-1-13 3:50:33,513,n,381683167
14:50:43 (7461): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2815210,1318,2013-1-13 3:50:44,513,n,381683167
14:50:54 (7471): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2826280,1325,2013-1-13 3:50:55,513,n,381683167
14:51:05 (7479): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2837450,1328,2013-1-13 3:51:6,513,n,381683167
14:51:16 (7544): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2848370,1330,2013-1-13 3:51:17,513,n,381683167
14:51:27 (7568): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2858480,1332,2013-1-13 3:51:27,513,n,381683167
14:51:37 (7570): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2869560,1342,2013-1-13 3:51:38,513,n,381683167
14:51:48 (7572): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2880640,1346,2013-1-13 3:51:49,513,n,381683167
14:51:59 (7576): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2891730,1348,2013-1-13 3:52:1,513,n,381683167
14:52:10 (7580): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2902750,1353,2013-1-13 3:52:12,513,n,381683167
14:52:21 (7658): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2913810,1363,2013-1-13 3:52:23,513,n,381683167
14:52:32 (7661): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2924850,1365,2013-1-13 3:52:34,513,n,381683167
14:52:44 (7669): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2935910,1368,2013-1-13 3:52:45,513,n,381683167
14:52:55 (7671): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2946990,1370,2013-1-13 3:52:56,513,n,381683167
14:53:06 (7673): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
2957340,1372,2013-1-13 3:53:6,513,n,381683167
____________

Proud Founder of
Have a look at my WebCam

Profile TJM
Project administrator
Project developer
Project tester
Send message
Joined: 16 Apr 11
Posts: 291
Credit: 1,382,673
RAC: 45

Message 1584 - Posted: 14 Jan 2013 | 16:33:01 UTC - in response to Message 1582.

Do you run other projects or something CPU-intensive on the same machine ?
The "no heartbeat" issue often pops up under high load due to problems with communication between core client and the app.

Profile ChertseyAl
Avatar
Send message
Joined: 16 Jun 11
Posts: 195
Credit: 687,656
RAC: 94

Message 1585 - Posted: 14 Jan 2013 | 18:46:27 UTC
Last modified: 14 Jan 2013 | 18:48:00 UTC

I've just had a 'no heartbeat' error for the first time ever.

http://radioactiveathome.org/boinc/result.php?resultid=1380523


{snip}
39502889,10190,2013-1-13 20:9:0,513,n,381683167
39543874,10202,2013-1-13 20:9:41,513,n,381683167
39584858,10218,2013-1-13 20:10:22,513,n,381683167
20:08:45 (2020): No heartbeat from core client for 30 sec - exiting
Radac $Rev: 558 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
39729364,10250,2013-1-13 20:10:30,513,n,381683167
39770380,10262,2013-1-13 20:11:11,513,n,381683167
39811364,10269,2013-1-13 20:11:52,513,n,381683167
39852349,10280,2013-1-13 20:12:33,513,n,381683167
{snip}


The WU did run for the full 24 hours, but the reported time is wrong.

The odd thing is that was granted the usual amount of credit, but my RAC took a dive. I'd only just got back the number 1 computer place yesterday *cries* ;)

FWIW I run QCN, WUProp and either DistRTGen or Primaboinca. Nobody uses the machine (no keyboard, monitor or mouse), I just VNC into it a couple of times a day to check that it's still alive.

Cheers,

Al.

Edited to remove the inevitable stray closing URL tag :)
____________

Profile ChertseyAl
Avatar
Send message
Joined: 16 Jun 11
Posts: 195
Credit: 687,656
RAC: 94

Message 1586 - Posted: 14 Jan 2013 | 19:20:27 UTC - in response to Message 1585.

Oh dear, just noticed that my sensor seems to have flatlined :( I'll have to see if the hardware has packed up. Maybe just needs a reboot. Whatever, CBA tonight, I'll take a look tomorrow.

Cheers,

Al.

____________

Profile ChertseyAl
Avatar
Send message
Joined: 16 Jun 11
Posts: 195
Credit: 687,656
RAC: 94

Message 1587 - Posted: 14 Jan 2013 | 20:41:04 UTC - in response to Message 1586.

Oh, it's just woken up again. Must be server-side.

Cheers,

Al.

____________

Profile Dingo
Avatar
Send message
Joined: 16 Jun 11
Posts: 48
Credit: 438,488
RAC: 85

Message 1588 - Posted: 15 Jan 2013 | 9:54:35 UTC - in response to Message 1584.

Do you run other projects or something CPU-intensive on the same machine ?
The "no heartbeat" issue often pops up under high load due to problems with communication between core client and the app.



Yes that computer also runs other BOINC projects. I will retry it and leave a cpu available for this app.
____________

Proud Founder of
Have a look at my WebCam

Profile Dingo
Avatar
Send message
Joined: 16 Jun 11
Posts: 48
Credit: 438,488
RAC: 85

Message 1589 - Posted: 15 Jan 2013 | 10:00:49 UTC - in response to Message 1588.
Last modified: 15 Jan 2013 | 10:19:18 UTC

I suspended all other tasks in BOINC and still get the error. I will run down all the work and uninstall BOINC and try the standard BOINC and radac and see what happens.

Does the Linux app work on 64 bit Linux ?? Is that why I built my own ??

Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
31700,15,2013-1-15 9:56:42,513,f,381683167
20:56:51 (8417): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
42700,18,2013-1-15 9:56:53,513,n,381683167
20:57:03 (8420): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
53780,23,2013-1-15 9:57:4,513,n,381683167
20:57:14 (8494): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
63890,26,2013-1-15 9:57:14,513,n,381683167
20:57:24 (8506): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
74950,31,2013-1-15 9:57:25,513,n,381683167
20:57:35 (8522): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
86040,34,2013-1-15 9:57:36,513,n,381683167
20:57:46 (8530): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
97080,40,2013-1-15 9:57:47,513,n,381683167
20:57:57 (8538): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
108150,48,2013-1-15 9:57:58,513,n,381683167
20:58:08 (8545): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
118530,52,2013-1-15 9:58:9,513,n,381683167
20:58:18 (8625): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
129600,59,2013-1-15 9:58:20,513,n,381683167
20:58:30 (8632): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
140680,63,2013-1-15 9:58:31,513,n,381683167
20:58:41 (8641): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
151720,64,2013-1-15 9:58:42,513,n,381683167
20:58:52 (8654): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
162800,70,2013-1-15 9:58:53,513,n,381683167
20:59:03 (8663): No heartbeat from client for 30 sec - exiting
Radac $Rev: 440 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
173860,77,2013-1-15 9:59:4,513,n,381683167
____________

Proud Founder of
Have a look at my WebCam

Profile ChertseyAl
Avatar
Send message
Joined: 16 Jun 11
Posts: 195
Credit: 687,656
RAC: 94

Message 1590 - Posted: 15 Jan 2013 | 10:28:03 UTC - in response to Message 1587.

Something odd with this current WU. Was due to finish over an hour ago, but is still running and looks like it will over-run by 6 hours. Also, yesterday, my graph flatlined for a while.

Grabbed a copy of the stderr.txt and I see this:

Radac $Rev: 558 $ starting...
sensors.xml: 6 nodes found
Found sensor v2.01
199053,43,2013-1-14 9:11:39,513,f,381683167
240147,51,2013-1-14 9:12:20,513,n,381683167
281132,62,2013-1-14 9:13:0,513,n,381683167

[snip, all records with 'n']

14223632,3738,2013-1-14 13:5:34,513,n,381683167
14264632,3753,2013-1-14 13:6:15,513,n,381683167
14308847,3769,2013-1-14 13:6:59,513,n,381683167
14513629,3825,2013-1-14 13:10:24,513,r,381683167
14703381,3862,2013-1-14 13:13:34,513,r,381683167
14903168,3904,2013-1-14 13:16:54,513,r,381683167

[snip, all records with 'r']

35649450,9394,2013-1-14 19:3:4,513,r,381683167
35854200,9448,2013-1-14 19:6:29,513,r,381683167
35961937,9478,2013-1-14 19:8:17,513,r,381683167
36005106,9494,2013-1-14 19:9:0,513,n,381683167
36047699,9500,2013-1-14 19:9:43,513,n,381683167
36088682,9510,2013-1-14 19:10:24,513,n,381683167

[snip, all records with 'n']

So basically 6 hours was 'lost' with the 'r' type records. Note that the sensor was displaying sensible values and updating as normal during this period.

What is the significance of 'n' and 'r'?

Cheers,

Al.

____________

Profile Dingo
Avatar
Send message
Joined: 16 Jun 11
Posts: 48
Credit: 438,488
RAC: 85

Message 1591 - Posted: 16 Jan 2013 | 2:55:52 UTC - in response to Message 1590.
Last modified: 16 Jan 2013 | 2:57:35 UTC

@ChertseyAl

I thought that I had seen that somewhere on this site so I looked and found the types here http://radioactiveathome.org/boinc/forum_thread.php?id=60&nowrap=true#574


The live database is limited up to 31 days back (it is cleared from time to time).
There are currently 3 types of samples: "f" - first one read from sensor at the app start, "n" - normal sample taken when app is running, "r" - sample taken when the app was resumed/restarted. Both "f" and "r" may contain "long" samples taken for example when the PC was left in standby and the sensor was running.


Cheers
____________

Proud Founder of
Have a look at my WebCam

Profile TJM
Project administrator
Project developer
Project tester
Send message
Joined: 16 Apr 11
Posts: 291
Credit: 1,382,673
RAC: 45

Message 1592 - Posted: 16 Jan 2013 | 16:29:51 UTC - in response to Message 1591.

"r" usually means the app was suspended, client restarted or the system was very busy and API call took a long time.
In the log pasted above I clearly see there was some issue, the timer between reads is 200 seconds instead of around 40.

I think there is a bug either in the latest BOINC libraries/API (funny thing is that actually they recommended an upgrade due to bugs) or the latest core clients/managers, I have to do dig in the data a bit to see what's wrong.
I think I'll release an updated app which with ability to print some info about BOINC API calls in debug mode.


____________

Profile TJM
Project administrator
Project developer
Project tester
Send message
Joined: 16 Apr 11
Posts: 291
Credit: 1,382,673
RAC: 45

Message 1594 - Posted: 16 Jan 2013 | 16:53:06 UTC - in response to Message 1592.
Last modified: 16 Jan 2013 | 16:56:01 UTC

ChertseyAl - if you experience any issues, please reduce the runtime to at most 8 hours. The stderr holds only up to 64kB of data, after 24 hours runtime stderr is incomplete and it's a lot harder to diagnose problems.
Btw, "no heartbeat" every now and then is nothing unusual, it happens sometimes when the machine has a brief load spike.

Post to thread

Message boards : Number crunching : Error on sensor


Main page · Your account · Message boards


Copyright © 2024 BOINC@Poland | Open Science for the future