Node Data Integrity Problem
Support staff receives email notification that "file is older than 15 minutes".
A ps -efw | grep vwpoint shows that very few if any of the agent processes are running. However, the jobInitiater and serviceQueue processes are still running.
First, examine the jobInit.log file under $VIEWPOINT/log. (If logging is not enabled this file may not exist or it may be left over from the past. In either case, goto the bin folder and edit the startNetWatchNode.sh script to enable it, then restart the node software.
If you see that any of the quarter hour check cycles are not completing then this typically indicates a data integrity problem in the database.
- First look for "Done with this round of data" near the top of the hour check cycle
- Then backwards search for "Get next service from DB (779) of (934)". If the two numbers are not equal (first number is current row, the second number is total rows to process) then the check cycle is not completing. this is almost always due to data integrity problems
Then backwards search for "Got service data from DB (HTTP)(20799)" followed by a "DIS-CONNECTED FROM DATABASE". Look at the text in parenthesis. The "HTTP" indicates the service type and the "20799" indicates the service id. This is where the problem lies in the database. More than likely, there is no record of 20799 in the httpservice table on the node database.
Using sqlplus query this record from appropriate service table. If it does NOT exist, then remove this servicid from the master servicelist table on the node database.
SQL> select * from servicelist where serviceid=20799; SERVICEID SERVICETYPE CHANGE D M ---------- ----------- ---------- - - 20799 4 0 N N SQL> select * from servicenode where serviceid=20799; SERVICEID NODEID EXPIREDAT INTERVAL ---------- ---------- --------- ---------- 20799 2 01-JAN-25 15 20799 4 01-JAN-25 15 20799 9 01-JAN-25 15 20799 32 01-JAN-25 15 20799 33 01-JAN-25 15 SQL> delete from servicenode where serviceid=20799; 5 rows deleted. SQL> delete from servicelist where serviceid=20799; 1 row deleted. SQL> commit; Commit complete.
Here's an example of how to cleanup a HTTP Transaction
SQL> delete from tservicelist where serviceid=22224; 9 rows deleted. SQL> delete from httptservice where serviceid=22224; 1 row deleted. SQL> delete from servicenode where serviceid=22224; 5 rows deleted. SQL> delete from servicelist where serviceid=22224; 1 row deleted. SQL> commit; Commit complete. SQL> quit
After cleaning up all data integrity problems you will need to restart the jobInitiater.