This paper takes session identification in web log mining as research object, proposes an improved algorithm based on average time threshold value. By calculating the average intervals dynamically among request records in the session, adjusting the time threshold value individually, and compared to the traditional algorithm that defines a uniform threshold value for all users' web pages, the algorithm in this paper can identify the long session more accurately. At last, the algorithm re-identifies the generated sets of candidate session, which make the identified session more reasonable and effective. Experiment result shows that the quality of session identification is improved.
展开▼