Defeat Moss Plagiarism Detection System

本地部署 Moss

how MOSS works

https://github.com/genchang1234/How-to-cheat-in-computer-science-101

http://www.doc88.com/p-991529643125.html

Winnowing: Local Algorithms for Document Fingerprinting
http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf

https://www.zhihu.com/question/27085271

基本原则

  • Variable rename is of no use.
  • Whitespace insensitivity
    Whitespace insensitivity In matching text files, matches should be unaffected by such things as extra whitespace, capitalization, punctuation, etc. In other domains the notion of what strings should be equal is different—for example, in matching software text it is desirable to make matching insensitive to variable names.
  • Noise suppression
    Discovering short matches, such as the fact that the word the appears in two different documents, is uninteresting. Any match must be large enough to imply that the material has been copied and is not simply a common word or idiom of the language in which documents are written.
  • Position independence
    Coarse-grained permutation of the contents of a document (e.g., scrambling the order of paragraphs) should not affect the set of discovered matches. Adding to a document should not affect the set of matches in the original portion of the new document. Removing part of a document should not affect the set of matches in the portion that remains.

安全区间

Typically, a potential case of plagiarism will result in over 50% matching, while a non-plagiarism case will be around 10-15% matching.

正确方式

  1. IDE重构(refactor)
  2. 按行拆分
  3. 同样逻辑的不同实现
  4. 拆除 lambda 表达式
  5. 构建大型无意义循环,(循环内嵌if),得到0或空字符串,并加入最终返回结果(冗余代码植入并污染数据流)
  6. 修改常量值 字符串拆解(运算化 “ab”=”a”+”b”)
  7. 反转if条件 if A 改 if !A
  8. 面向对象改面向过程
  9. 重要 单行拆成多行, 多行合并成单行
  10. 无意义函数参数 原函数四参数,修改为五参数,第五个参数无意义
  11. 将那些长度很长的函数拆成几个独立的短函数这样能让你的代码看起来更简洁而且逻辑性更强。不但是一种抄袭的方法同时也是一种好的编码习惯。

逻辑逆转 a is not None -> not a is None, a>b -> b+0<a+7*zero
冗余逻辑 a and b and True or False

  1. 预先检查
  2. 函数添加无意义参数
  3. 常数拆解成运算
  4. 部分判断条件拆解成变量

  5. 黑洞函数

1
2
def blackhole(a, b=None,c=None,d=None):
return a
  1. for-enumerate

    1
    2
    3
    for i in range(0,10)
    for idx,i in enumerate(range(0,10))
  2. 字符串函数

    1
    2
    'a'
    str('a')
  3. 逻辑表达式转换

    1
    2
    a == b
    not a != b

不同的 abstract tree
不同的 控制流图与数据流图
不同的 semantic vector